Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
22
Last year, my coworker Suzanne and I got a case study accepted into an ethnography conference! She's an anthropologist by training, and I'm a software engineer. How'd we wind up there? The short version is I asked her what I thought was an interesting question, and she thought it was an interesting question, and we wound up doing research. We have a lot more to go on the original question—we're studying the relationship between the values we hold as engineers and our documentation practices—and this case study is the first step in that direction. The title of the case study is a mouthful: "Foundations of Practice: An Ethnographic Exploration of Software Documentation and its Cultural Implications for an Agile Startup". I mean, it's intended for a more academic audience, and in a field that's not software engineering. But what we studied is relevant for software engineers, so let's talk about it here in more familiar language! If you want to read the full case study, it's open...
a month ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from ntietz.com blog - technically a blog

Stewardship over ownership

Code ownership is a popular concept, but it emphasizes the wrong thing. It can bring out the worst in a person or a team: defensiveness, control-seeking, power struggles. Instead, we should be focusing on stewardship. How code ownership manifests Code ownership as a concept means that a particular person or team "owns" a section of the codebase. This gives them certain rights and responsibilities: They control what goes into the code, and can approve or deny changes They are responsible for fixing bugs in that part of the code They are responsible for maintaining and improving that part of the code There are tools that help with these, like the CODEOWNERS file on GitHub. This file lets you define a group or list of individuals who own a section of the repository. Then you can require reviews/approvals from them before anything gets merged. These are all coming from a good place. We want our code to be well-maintained, and we want to make sure that someone is responsible for its direction. It really helps to know who to go to with questions or requests. Without these, changes can grind to a halt, mired in confusion and tech debt. But the concept in practice brings challenges. If you've worked on a team using code ownership before, you've probably run into: that engineer who guards the code against anyone else's changes, wanting all the credit for themselves that engineer who refuses to add anything else to their codebase, because they don't want to maintain it that engineer who tries to gain code ownership over more areas, to control more of the code and more of the company I've done certainly acted badly due to code ownership, without realizing what I was doing or or why I was doing it at the time. There are almost endless ways that code ownership can bring out the worst in people. And it all makes sense. We can do better by shifting to stewardship instead of ownership. Stewardship is about service We are all stewards of things we own or are responsible for. I have stewardship over the house I live in with my family, for example. I also have stewardship over the espresso machine I use every day: It's a big piece of machinery, and it's my responsibility to take good care of it and to ensure that as long as it's mine, it operates well and lasts a long time. That reduces expense, reduces waste, and reduces impact on the world—but it also means that the object (an espresso machine) is serving its purpose to bring joy and connection. Code is no different. By focusing on stewardship rather than ownership, we are focusing on the responsible, sustainable maintenance of the code. We focus on taking good care of that which we're entrusted with. A steward doesn't jealously guard, or struggle to gain more power. A steward watches what her responsibilities are, ensuring enough to contribute but not so many as to burn out. And she nurtures and cares for the code, to make sure that it continues to serve its purpose. Instead of an adversarial relationship, stewardship promotes partnership: It promotes working with others to figure out how to make the best use of resources, instead of hoarding them for yourself. Stewardship can solve many of the same problems that code ownership does: It gives you someone who's a main point of contact for some code It grants someone responsibility for bug fixes and maintenance of that code And in some ways, they look alike. You're going to do a lot of the same things, controlling what goes in or out. But they are very different in the focus. Owners are concerned with the value of what they own. Stewards are concerned with how well it can serve the group. And this makes all the difference in producing better outcomes.

4 days ago 7 votes
Some things that make Rust lifetimes hard to learn

After I wrote YARR (Yet Another Rust Resource, with requisite pirate mentions), one of my friends tried it out. He gave me some really useful insights as he went through it, letting me see what was hard about learning Rust from a newcomer's perspective. Unsurprisingly, lifetimes are a challenge—and seeing him go through it helped me understand why they're hard to learn. Here are a few of the challenges he ran into. I don't think that these are necessarily problems, but they're perhaps opportunities to improve educational materials. They don't map 100% to how long a variable is in memory My friend gave me an example he's seen a few times when people explain lifetimes. fn longest<'a>(x: &'a str, y: &'a str) -> &'a str { if x.len() > y.len() { x } else { y } } And for many newcomers, you see this and you expect it is saying that x and y both have the lifetime 'a, so they live the same amount of time. But the following is valid: fn print_longest(x: &'static str) { let y = "local"; let a = longest(x, y); println!("{a}"); drop(a); drop(y); println!("y is gone"); } In this example, x and y live for different amounts of time. y doesn't even survive to the end of the function, whereas x should be valid for the entire duration of the program. That's because lifetimes are talking about a bound on the time something can live. There's some lifetime 'a during which we can say that x and y are both certainly valid. But x and y can both live longer than 'a. Lifetimes don't change the runtime behavior Most code we write changes what the program does at runtime. Types can be different, because sometimes you're giving the compiler information about what something is. But most type information can change the runtime behavior! The simplest example is when you have an integer. You can declare one without a type. let x = 10; This has an inferred type, and if you set a different type, like u8, you'll get different behavior at run time. let x: u8 = 10; In contrast, lifetimes are only used by the compiler to ensure that borrows are all valid. The compiler can reject your program if invalid borrows are performed, but the binary output should not be affected by the lifetimes of the variables. It's a different kind of type system We're used to seeing types in our programming languages, and these type systems are usually pretty similar. Rust's lifetimes are different, though. The borrow checker uses a linear type system to do its work. These are super cool, and something that I don't understand particularly well. I'm familiar with how to use the borrow checker, but I don't know any of the theory behind them. The premise, as I understand it, is that objects can be used exactly once, allowing you to safely deallocate it after use (since it won't be used again). This prevents multiple concurrent uses (yay, data race protection!) or use-after-free (yay, segfault protection!). The coolness is why we have it, but it's still pretty tough to understand. You have to learn this whole new type system that's pretty different from everything else you've touched. And most of the resources1 out there don't even mention that it's a different kind of type system! They share syntax with generics Another challenge is that the syntax is shared with generics. Even though lifetimes are very different in behavior and type system from generics, they sit inside very similar looking syntax. This is probably unavoidable—lifetimes are related to all the other types in your code—but it certainly makes things harder to learn. When you see something like this, you expect that it's generic over a type. fn something_generic<T>(arg: T) { ... } And you're right that it is! But then you have something that looks very similar, like this. And you might expect it to also be generic over a type. fn something_generic<'a>(arg: &'a str) { ... } But it's not, in the normal sense. Instead it's generic over a lifetime. And that's a little confusing that those sit in the same spot, especially when it's not called out as a potential gotcha in learning materials. * * * Lifetimes have some inherent complexity. The borrow checker is a very valuable tool, and it's great we have it! But with that power and complexity can come challenges in learning, and teaching, the underlying concepts. I think the current difficulty in learning Rust is due to a lot of things. One aspect is certainly some inherent complexity. But another aspect is that many resources aren't really geared toward the kind of programmer coming to Rust without this background knowledge, and there is room for improvement. We can make explanations of lifetimes and the borrow checker better and less confusing. Or we can at least make them more empathetic, projecting that it's expected to be confused because there are some good reasons it's hard to understand. And that you'll get there, eventually. Thank you, Ryan, for generously sharing your thoughts as you went through learning Rust. Our conversations were instrumental in writing this post. 1 I suppose, as the author of YARR, I can fix this in at least one instance.

a week ago 8 votes
Your product shouldn't require showing my legal name

Last week, I finally got verified on LinkedIn. Now there's a little badge next to my name that says "yes, she's a human who is legally named Nicole." Their marketing for verification says that I should now expect 60% more profile views and 50% more comments and reactions. For a writer like me, that seems great. More people viewing my content means more people can learn from me, or be entertained by me. And all that for free? There's a problem, of course. Nicole is my legal name, so I was able to get verified as a result. But many people don't go by their legal name. Other names are common So who doesn't go by their legal name? I didn't, for years after my wife and I got married. I went by an alias—our hyphenated last name—without legally changing it. I wasn't eligible to get verified then, since that name was not on my ID. I didn't, when I came out as transgender. It takes time to change your name and update your documents. Until that was complete, I would have had to go by my deadname or lose verification. And what about women who change their name when they get married, but go by their previous name professionally? This is an alias, and it is their name even though it's not what the government knows them as. But they would lose verification for doing this. Anyone who goes by a nickname or alias is ineligible. I have many friends who go by a different name than what their ID shows. This isn't fraudulent—it just reflects who they are. Penalized for being yourself And yet, if you fall into any case where you cannot get verified, then you can't get the benefits. You can't get your extra profile views and extra comments. Or put another way: you're penalized for not being verified. You'll get about 40% fewer profile views and 33% fewer comments/reactions than people who are verified. You get marginalized, unable to reap the full benefits of the platform, if you don't conform to a very particular outlook on what a name is (the official sequence of letters on your ID). To be clear, the problem isn't the verification process itself1. That process (and its associated benefits) may be in place to deal with bot traffic. I can sympathize with this, and I do want lower bot traffic—it makes platforms much more pleasant to use. Let us use our names The problem is that you have to show your legal name to everyone. There should be a process for being verified without it being your legal name on display. This process doesn't have to be scalable if the group that would utilize it is small—since surely they'd only forget about small populations2. It can be as simple as filing a help ticket and allowing a human to approve it based on some evidence. Is your name consistent across your public profiles? And you're a human being? Cool, verified. I believe that LinkedIn can, and should, do better. This feature as implemented is harming marginalized folks who are not able to get the same visibility when they cannot get verified. It reduces the exposure that marginalized creators can get. You should keep this in mind in the products you make, too. Don't require people to display their legal names. And before you even collect that data, think about what problem you're trying to solve. Do you need to collect legal names to solve that? (Probably not.) If so, do you need to store them after processing once? (Probably not.) And if so, do you need to display them publicly? (Probably not.) Names are so much more than what the government knows us by. Let us be our true selves and verify us with our true names. 1 It's not free of problems, though: I'd like to have a way to achieve the same result ("she's a human! she's generally the internet person she claims she is!") without showing my government identity documents to a third party. 2 Though, companies have been known to marginalize large groups of people. This is a rhetorical point that they are either harming a lot of people or they could solve the problem for a low cost.

2 weeks ago 16 votes
Can I ethically use LLMs?

The title is not a rhetorical question, and I'm not going to bury an answer. I don't have an answer. This post is my exploration of the question, and why I think it is a question1. Important things up front: what's my relationship with LLMs today? I don't use any LLMs regularly. I do have access to GitHub Copilot through my employer. I have it available on a hotkey, I think, and I cannot remember how to trigger it since I do not use it. I've explored using LLMs in the past. I used to be a regular Copilot user, and I explored ChatGPT, Claude, etc. to see what their capabilities were. I have done trainings for my coworkers on how to use them effectively, though I would not feel comfortable doing so now. My employer's product uses LLMs. I don't want to link to my employer, but yeah, I guess my paycheck depends on being okay with integrating them in? It's complicated (a refrain). (This post is, obviously, my opinions and does not reflect my employer.) I don't think using or not using them is a moral failing. There is a lot of moralizing around LLM usage. I'm not doing any of that here. I have my own beliefs (or my own questions), but I don't think people using LLMs are immoral (or vice versa). So, you can see I have used them and I'm not absolutist, but I don't use them today. Why not? Why did I stop using them in spite of the advances, where they're more capable than ever? It's because of these questions and issues. Where I have undecided ethical questions, I lean toward the more conservative2 choice of not using them until I have clarity on the ethics. (Note: I am not inviting folks to email me with answers to this question.) Energy usage Another technology that uses a lot of energy is blockchains. I think using public blockchains is almost universally unethical since there are other, better, less harmful options. Part of the harm from blockchains is an absurd amount of energy usage. LLMs also use a lot of energy. This can be split into training and inference energy usage. These vary based on the model. Some models can run locally on Apple silicon, and those are lower energy usage—their upper bound is running your computer full tilt, and an M4 Mac mini's max power consumption is 65 watts. This is roughly equivalent to one incandescent light bulb, or 8 LED light bulbs. It's good to turn off unnecessary lights, but doing so isn't going to solve the climate crisis; we need bigger, more sweeping reforms. I don't think that local models are going to significantly alter the climate crisis in either direction. Other models are massive and run in data centers on lots of power-hungry GPUs3. These data centers also require construction, and that comes with its own environmental impact. An article from Tom's Guide last year showed that "a single query on ChatGPT-4 can use up to 3 bottles of water, that a year of queries uses enough electricity to power over nine houses". A lot of the cost comes from new data centers being built. The demand for LLMs has led to more demand for power generation and more demand for data centers. And this new power generation is coming from gas-fired plants instead of sustainable, clean energy sources, because that's all we can build fast enough. A lot of attention is given to the training side. The numbers for training are large and shocking: Llama 3 used 500 mWh and GPT-3 training used 1,287 mWh—even more if you include the cost of training failed models which preceded these, the experiments that made the models possible. The listed figures are high, and 500 mWh is about the cost of a large jet flying for 7 hours. But we do it once per foundational model, and then the cost is spread across all the remaining usage. I don't think that the training side is significantly shifting the equation on climate change. We'd have a much larger impact on improving the climate crisis by advocating for remote work—reducing vehicles on the road, making many flights unnecessary—than by not training models. Overall it feels to me like local models have a clearly acceptable impact, and data center models have higher energy usage but still probably do not change the situation very much. Training data The training data for LLMs has largely been lifted without the consent of the people who generated that data. This is a lot of writing, music, videos, visual art, all of it. There are some attempts at there at using licensed data only, but the majority of models, and the most popular ones, are unlicensed data. Now the question is: is this an ethical problem? I know there are opinions on both sides of this. Some say that this data is publicly visible on the internet (though some of the data was not on the internet), and so it's fair game. Others say that this use isn't one that people consented to, and it should require that consent. My thought experiment is this: If we made search engines today, would people have this same objection to a search engine using their data without their express consent? I think most people would ultimately support search engines. They are different than LLMs, because they (mostly) serve results that point you to the original source, rather than create new content for you to replace the original sources with. But maybe people would reject search engines. And maybe consistency between these two isn't necessary, or maybe it's a false consistency—there could be other differentiating details that lead to different answers. Where I come down is that I think we need a robust mechanism to opt out of use of data for training, but that it's probably fine to train with publicly available data on the internet. What you do with the trained model is another question entirely. When you try to replace people making original works instead of creating an entirely new function, that's where you get questions. "Replacing" people There's a lot of LLM usage that is just trying to replace people in entire jobs. I mean, it feels like all of it. We see LLMs that are meant to replace writers and editors, artists and illustrators, musicians and songwriters. It doesn't say that directly—it says you're empowered to create things yourself. But what it means is that people should be able to press a few buttons and, with no artist involved, get out a beautiful artwork. Sounds like replacing to me. This is something we've long done with technology. We make technology that puts people out of jobs, and that was the whole industrial revolution. That's what happened with shipping from the containerization of ports, putting many dockworkers out of jobs. Replacing people in the abstract is not unethical. The problem is if we fail to deal with the harm created from replacing people. And people will be harmed, because losing your job or the value of it going down has a serious impact on quality of life. When we put people out of work, we—both society and technologists—have an ethical responsibility to ensure there's a plan to mitigate the harm from that. Maybe that means grants for living expenses while people switch fields, if put out of work in an LLM-heavy industry. Maybe that means a universal basic income. But it certainly doesn't mean doing nothing. Incorrect information and bias One of the major well-known problems of LLMs is a tendency to "hallucinate," or to confidently state facts that are made up from whole cloth. They also have an unknown amount of bias, with unknown mitigations in place, due to being closed systems. This is a big problem! We're not good at seeing what information is incorrect in something that's generated to look like it's the most likely string of tokens. If there is incorrect information in there, we'll just miss it. This means that people can make poor decisions on the basis of what an LLM tells them. They can have lost income due to its mistakes. And the bias? That's a huge problem, because it means that we don't know if this system will reinforce existing problematic norms4. We don't know what it will reinforce, because the training data is closed and there's not a lot of public evaluation on bias. So ultimately, we're left with an unknown harm of unknown magnitude to an unknown population5. Concentrating power (with the wealthy elite) A big ethical concern for me is also what this will do to our entire society. Many technologies are heralded as "democratizing" things. Spotify "democratized" music by making it so that anyone can get listens—but, y'all, it ended up flattening the tail and making the popular artists more money while making small artists less money. Will LLMs do the same? We know that the big models need large data centers to run their training and inference. And even small models need beefy hardware to run inference, let alone training! We have some access to models which can run locally, which is a good step. But the problem is that we can only run other people's models. They'll have those people's decisions baked in, decisions on which data to include or to exclude, decisions on how to approach questions of bias and abuse. And when the hardware to run the biggest models is only accessible to a few companies, that means that those companies really get a lot more powerful. OpenAI and Anthropic and Google and Meta all have the ability to run really large models. I certainly don't, though. This means that a technology that many are heralding as making things more accessible to everyone is controlled by a small handful of people. A small handful of people can decide how the models are trained, and set policies on how they're used. In a time when the US government is trying to get any paper retracted that mentioned queer people, and erasing trans people from the Stonewall monument, it feels self-evident that letting a small group of people control this technology imperils the future of many people. * * * Ultimately, I want robots to do the things I don't want to do. I want them to do my dishes and my laundry. I don't want them to play music instead of me, write code instead of me, write words instead of me. I am not sure whether or not using LLMs is unethical. There are certainly ways of using them which are unquestionably unethical—as is true with every technology. And there are ways of developing LLMs which are unethical—as is true with every technology. But the problems with them are large. I think it is unethical to use them without addressing the ethical questions above. If you're not working on mitigating the harms from LLMs (which do exist), then you might be doing something unethical. 1 I've been in the interesting situation of having anti-LLM people think I'm pro-LLM and vice versa. It's a very weird feeling, and makes me a little nervous of posting this! But this is an important question and an important conversation. 2 Footnoting out of an abundance of caution: I don't meant conservative-like-Republicans, because, no, I'd like to keep my rights thank you very much. I just mean in terms of minimizing risk. Let's please stop attacking trans rights, immigrants, Palestinians, and you know, everyone else that I've forgotten because the whole world seems to be on fire. 3 If we ever achieve artificial sentience, this sentence may acquire a second, more sinister, meaning. 4 It will. 5 My guess is a large harm to all underrepresented groups, but who asked the trans woman?

3 weeks ago 19 votes
What's in a ring buffer? And using them in Rust

Working on my cursed MIDI project, I needed a way to store the most recent messages without risking an unbounded amount of memory usage. I turned to the trusty ring buffer for this! I started by writing a very simple one myself in Rust, then I looked and what do you know, of course the standard library has an implementation. While I was working on this project, I was pairing with a friend. She asked me about ring buffers, and that's where this post came from! What's a ring buffer? Ring buffers are known by a few names. Circular queues and circular buffers are the other two I hear the most. But this name just has a nice ring to it. It's an array, basically: a buffer, or a queue, or a list of items stacked up against each other. You can put things into it and take things out of it the same as any buffer. But the front of it connects to the back, like any good ring1. Instead of adding on the end and popping from the end it, like a stack, you can add to one end and remove from the start, like a queue. And as you add or remove things, where the start and end of the list move around. Your data eats its own tail. This lets us keep a fixed number of elements in the buffer without running into reallocation. In a regular old buffer, if you use it as a queue—add to the end, remove from the front—then you'll eventually need to either reallocate the entire thing or shift all the elements over. Instead, a ring buffer lets you just keep adding from either end and removing from either end and you never have to reallocate! Uses for a ring buffer My MIDI program is one example of where you'd want a ring buffer, to keep the most recent items. There are some general situations where you'll run into this: You want a fixed number of the most recent things, like the last 50 items seen You want a queue, especially with a fixed maximum number of queued elements2 You want a buffer for data coming in with an upper bound, like with streaming audio, and you want to overwrite old data if the consumer can't keep up for a bit A lot of it comes down to performance, and streaming data. Something is producing data, something else is consuming it, and you want to make sure insertions and removals are fast. That was exactly my use: a MIDI device produces messages, my UI consumes them, but I don't want to fill up all my memory, forever, with more of them. How ring buffers work So how do they work? This is a normal, non-circular buffer: When you add something, you put it on the end. If you're using it as a stack, then you can pop things off the end. But if you're using it as a queue, you pop things off the front... And then you have to shuffle everything to the left. That's why we have ring buffers! I'll draw it as a straight line here, to show how it's laid out in memory. But the end loops back to the front logically, and we'll see how it wraps around. We start with an empty buffer, and start and end both point at the first element. When start == end, we know the buffer is empty. If we insert an element, we move end forward. And it gets repeated as we insert multiple times. If we remove an element, we move start forward. We can also start the buffer at any point, and it crosses over the end gracefully. Ring buffers are pretty simple when you get into their details, with just a few things to wrap your head around. It's an incredibly useful data structure! Rust options If you want to use one in Rust, what are your options? There's the standard library, which includes VecDeque. This implements a ring buffer, and the name comes from "Vec" (Rust's growable array type) combined with "Deque" (a double-ended queue). As this is in the standard library, this is quite accessible from most code, and it's a growable ring buffer. This means that the pop/push operations will always be efficient, but if your buffer is full it will result in a resize operation. The amortized running time will still be O(1) for insertion, but you incur the cost all at once when a resize happens, rather than at each insertion. You can enforce size limits in your code, if you want to avoid resize operations. Here's an example of how you could do that. You check if it's full first, and if so, you remove something to make space for the new element. let buffer: VecDeque<u32> = VecDeque::with_capacity(10); for thing in 0..15 { // if the buffer is already full, remove the first element to make space if buffer.len() == 10 { buffer.pop_front(); } buffer.push_back(thing); } There are also libraries that can enforce this for you! For example, circular-buffer implements a ring buffer. It has a fixed max capacity and won't resize on you, instead overwriting elements when you run out of space. The size is set at compile time, though, which can be great or can be a constraint that's hard to meet. There is also ringbuffer, which gives you a fixed size buffer that's heap allocated at runtime. That buys you some flexibility, with the drawback of being heap-based instead of stack-based. I'm definitely reaching for a library when I need a non-growable ring buffer in Rust now. There are some good choices, and it saves me the trouble of having to manually enforce size limits. 1 One of my favorite rings represents the sun and the moon, with the sun nestling inside this crescent moon shape. Unfortunately, the ring is open there. It makes for a very nice visual effect, but it kept getting snagged on things and bending. So it's not a very good ring for living an active hands-on life. 2 These can also be resizable! The Rust standard library one is. This comes with reallocation cost, which amortizes to a low cost but can be expensive in individual moments.

a month ago 18 votes

More in programming

ChatGPT Would be a Decent Policy Advisor

Revealed: How the UK tech secretary uses ChatGPT for policy advice by Chris Stokel-Walker for the New Scientist

16 hours ago 3 votes
Setting policy for strategy.

This book’s introduction started by defining strategy as “making decisions.” Then we dug into exploration, diagnosis, and refinement: three chapters where you could argue that we didn’t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the following chapter on operations, we finally start to actually make some decisions. In this chapter, we’ll dig into: How we define policy, and how setting policy differs from operating policy as discussed in the next chapter The structured steps for setting policy How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way? Recurring kinds of policies that appear frequently in strategies Why it’s valuable to be intentional about your strategy’s altitude, and how engineers and executives generally maintain different altitudes in their strategies Criteria to use for evaluating whether your policies are likely to be impactful How to develop novel policies, and why it’s rare Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis How policies that ignore constraints sound inspirational, but accomplish little Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders By the end, you’ll be ready to evaluate why an existing strategy’s policies are struggling to make an impact, and to start iterating on policies for strategy of your own. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. What is policy? Policy is interpreting your diagnosis into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They’ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization. An effective policy solves the entirety of the strategy’s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the strategy for working with private equity ownership acknowledges in its diagnosis that they don’t have clear guidance on what kind of reduction to expect: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available: We believe our new ownership will provide a specific target for Research and Development (R&D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes. There are two frequent points of confusion when creating policies that are worth addressing directly: Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy’s diagnosis. For example, the “N-1 backfill policy” makes sense in the context of new, private equity ownership. The policy wouldn’t work well in a rapidly expanding organization. Any strategy without a policy is useless, but you’ll also find policies without context aren’t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections. Policy describes how tradeoffs should be made, but it doesn’t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization’s behavior to ensure policies are followed. When reworking a strategy to be more readable, it often makes sense to merge policy and operation sections together. However, when drafting strategy it’s valuable to keep them separate. Yes, you might use a weekly meeting to review whether the policy is being followed, but whether it’s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement. With this definition in mind, now we can move onto the more interesting discussion of how to set policy. How to set policy Every part of writing a strategy feels hard when you’re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It’s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow. The steps I follow to write a strategy’s policy are: Review diagnosis to ensure it captures the most important themes. It doesn’t need to be perfect, but it shouldn’t have omissions so obvious that you can immediately identify them. Select policies that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered. This is a broad instruction, but it’s simpler than it sounds because you’ll typically select from policies identified during your exploration phase. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances. If you do find yourself developing a novel policy, there’s a later section in this chapter, Developing novel policies, that addresses that topic in more detail. Consolidate policies in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization. Backtest policy against recent decisions you’ve made. This is particularly effective if you maintain a decision log in your organization. Mine for conflict once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don’t wholly eliminate those that you agree with. Just as it’s easy to crowd out opposing views in diagnosis if you don’t solicit their input, it’s possible to accidentally crowd out your own perspective if you anchor too much on others’ perspectives. Consider refinement if you finish writing, and you just aren’t sure your approach works – that’s fine! Return to the refinement phase by deploying one of the refinement techniques to increase your conviction. Remember that we talk about strategy like it’s done in one pass, but almost all real strategy takes many refinement passes. The steps of writing policy are relatively pedestrian, largely because you’ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you’d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower. How many policies? Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The strategy for decomposing a monolithic application is not one policy deciding not to decompose, but a series of four policies: Business units should always operate in their own code repository and monolith. New integrations across business unit monoliths should be done using gRPC. Except for new business unit monoliths, we don’t allow new services. Merge existing services into business-unit monoliths where you can. Four isn’t universally the right number either. It’s simply the number that was required to solve that strategy’s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That’s great: what makes a policy good is that it’s effective, not that it’s novel or inspiring. Kinds of policies While there are so many policies you can write, I’ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories. Approvals define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive. In the Index post-acquisition integration strategy, there were a number of complex decisions to be made, and the approval mechanism was: Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering. This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the user data access strategy’s approval strategy was more focused on managing corporate risk: Exceptions must be granted in writing by CISO. While our overarching Engineering Strategy states that we follow an advisory architecture process as described in Facilitating Software Architecture, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel. These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust. Allocations describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization’s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company’s beliefs. The strategy on Uber’s service migration has two concrete examples of allocation policies. The first describes the Infrastructure engineering team’s allocation between manual provision tasks and investing into creating a self-service provisioning platform: Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers. The second allocation policy is implicitly noted in this strategy’s diagnosis, where it describes the allocation policy in the Engineering organization’s higher altitude strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy. Direction provides explicit instruction on how a decision must be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment. Direction works well when you need an unambiguous policy that doesn’t leave room for interpretation. For example, Calm’s policy for working in the monolith: We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith. In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below. In that case, the team couldn’t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation. Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in strategy on working with new Private Equity ownership: We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit. It’s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful. Guidance provides a recommendation about how a decision should be made. Guidance is useful when there’s enough nuance, ambiguity, or complexity that you can explain the desired destination, but you can’t mandate the path to reaching it. One example of guidance comes from the Index acquisition integration strategy: Minimize changes to tokenization environment: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored. However, any other functionality must not be added to our tokenization environment. This might read like direction, but it’s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it’s not able to articulate what complexity is necessary, so ultimately it’s guidance because it requires significant judgment to interpret. A second example of guidance comes in the strategy on decomposing a monolithic codebase: Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context. This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it’s possible to merge a given service back into a monolithic codebase, they’ll probably disagree. That’s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion. When you’re working on a strategy’s policy section, it’s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they’re all usable: If you’re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform. If you’re an executive, you might lean heavily on direction. Indeed, you might lean too heavily on direction, where guidance often works better for areas where you understand the direction but not the path. If you’re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics. Finally, if you have a clear approach you want to take that doesn’t fit cleanly into any of these categories, then don’t let this framework dissuade you. Give it a try, and adapt if it doesn’t initially work out. Maintaining strategy altitude The chapter on when to write engineering strategy introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization. Without repeating that section in its entirety, it’s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies: Stripe’s Sorbet strategy only worked in an organization that enforced the use of a single programming language across (essentially) all teams Uber’s service migration strategy worked well in an organization that was unwilling to enforce consistent programming language adoption across teams Stripe’s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams’ leverage. Both altitudes make sense. Both have consequences. Criteria for effective policies In The Engineering Executive’s Primer’s chapter on engineering strategy, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit: Applicable: it can be used to navigate complex, real scenarios, particularly when making tradeoffs. Enforced: teams will be held accountable for following the guiding policy. Create Leverage: create compounding or multiplicative impact. The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn’t make as much sense here. Some policies certainly should create leverage (e.g. empower developer experience team by restricting new services), but others might not (e.g. moving to an N-1 backfill policy). Outside the executive context, what’s important isn’t necessarily creating leverage, but that a policy solves for part of the diagnosis. That leaves the other two–being applicable and enforced–both of which are necessary for a policy to actually address the diagnosis. Any policy which you can’t determine how to apply, or aren’t willing to enforce, simply won’t be useful. Let’s apply these criteria to a handful of potential policies. First let’s think about policies we might write to improve the talent density of our engineering team: “We only hire world-class engineers.” This isn’t applicable, because it’s unclear what a world-class engineer means. Because there’s no mutually agreeable definition in this policy, it’s also not consistently enforceable. “We only hire engineers that get at least one ‘strong yes’ in scorecards.” This is applicable, because there’s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don’t happen to get a strong yes. Next, let’s think about a policy regarding code reuse within a codebase: “We follow a strict Don’t Repeat Yourself policy in our codebase.” There’s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there’s room for debate, it’s a very context specific determination to decide how to enforce a decision. “Code authors are responsible for determining if their contributions violate Don’t Repeat Yourself, and rewriting them if they do.” This is much more applicable, because now there’s only a single person’s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there’s no longer any ambiguity around who is deciding whether a piece of code is a repetition. The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that’s unwilling to distinguish between good and bad judgment won’t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another. If you ever find yourself wanting to include a policy that for some reason either can’t be applied or can’t be enforced, stop to ask yourself what you’re trying to accomplish and ponder if there’s a different policy that might be better suited to that goal. Developing novel policies My experience is that there are vanishingly few truly novel policies to write. There’s almost always someone else has already done something similar to your intended approach. Calm’s engineering strategy is such a case: the details are particular to the company, but the general approach is common across the industry. The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in the strategy for adopting large-language models, the new technology can be engaged with as a generic technology: Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets. Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers. You could simply replace “LLM” with “data-driven” and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don’t know much about. However, even if your policy isn’t novel to the industry, it might still be novel to you or your organization. The steps that I’ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement: Collect a number of similar policies, with a focus on how those policies differ from the policy you are creating Create a systems model to articulate how this policy will work, and also how it will differ from the similar policies you’re considering Run a strategy testing cycle for your proto-policy to discover any unknown-unknowns about how it works in practice Whether you run into this scenario is largely a function of the extent of your, and your organization’s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances. Are competing policy proposals an anti-pattern? When creating policy, you’ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I’d encourage you to think of this as exposing gaps in your diagnosis. For example, when Stripe developed the Sorbet ruby-typing tooling, there was debate between two policies: Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase? Should we migrate the codebase to a preexisting strongly typed language like Golang or Java? These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall. If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than including them in the core of your strategy document. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy. Recognizing constraints A similar problem to competing solutions is developing a policy that you cannot possibly fund. It’s easy to get enamored with policies that you can’t meaningfully enforce, but that’s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it. To consider a few examples: The strategy for controlling access to user data might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere. Our approach to Uber’s service migration might have required more staffing for the infrastructure engineering team, but we knew that wasn’t going to happen, so it was a meaningless policy proposal to make. The strategy for navigating private equity ownership might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period. If you find a policy that contemplates an impractical approach, it doesn’t only indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward. In cases where aligning on the diagnosis isn’t possible, for example because you simply don’t understand the possibilities of a new technology as encountered in the strategy for adopting LLMs, then you’ve typically found a valuable opportunity to use strategy refinement to build alignment. Dealing with missing strategies At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing. They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the diagnosis chapter about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward. Sometimes, even when you do this, it’s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you’re an executive or an engineer, you’ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable. Summary After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there’s only one phase of strategy left to dig into, operating the policies you’ve created.

21 hours ago 3 votes
Fast and random sampling in SQLite

I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]

12 hours ago 1 votes
Choosing Languages
yesterday 3 votes
05 · Syncing Keyhive

How we sync Keyhive and Automerge

yesterday 1 votes