Stanislav Petrov

from Eric Bailey [alt+shift+b] in programming

A lieutenant colonel in the Soviet Air Defense Forces prevented the end of human civilization on September 26th, 1983. His name was Stanislav Petrov. Protocol dictated that the Soviet Union would retaliate against any nuclear strikes sent by the United States. This was a policy of mutually assured destruction, a doctrine that compels a horrifying logical conclusion. The second and third stage effects of this type of exchange would be even more catastrophic. Allies for each side would likely be pulled into the conflict. The resulting nuclear winter was projected to lead to 2 billion deaths due to starvation. This is to say nothing about those who would have been unfortunate enough to have survived. Petrov’s job was to monitor Oko, the computerized warning systems built to centralize Soviet satellite communications. Around midnight, he received a report that one of the satellites had detected the infrared signature of a single launch of a United States ICBM. While Petrov was deciding...

a month ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Eric Bailey

Harm reduction principles for digital accessibility practitioners

I debuted these principles in my axe-con 2025 talk, It is designed to break your heart: Cultivating a harm reduction mindset as an accessibility practitioner. They are adapted from The National Harm Reduction Coalition’s original eight principles. My adapted principles reflect philosophical and behavioral changes I’ve been cultivating. This is done to try and offset, and defend against systemic trauma and its resultant depression, burnout, and other negative experiences you can incur when doing digital accessibility work. If you have the time, I’d advise reading the original eight principles. I also recommend watching or reading the talk. I say this not in a self-promotional way, but instead that there is a lot of context that will be helpful in understanding: How these adapted principles came to be, and also The larger mindset shifts and practices that led to their creation. The principles There are eight principles in total. They are delivered in the context of how to approach evaluating a team’s efforts, and are: Accepting ableism and minimizing it Accepting, for better or worse, that ableism is a part of our world and choosing to work to minimize its harmful effects, rather than simply ignoring or condemning it. The original principle this is derived from is: “Accepts, for better or worse, that licit and illicit drug use is part of our world and chooses to work to minimize its harmful effects rather than simply ignore or condemn them.” Provisioning of resources is non-judgemental Calling for the non-judgemental provision of services and resources for people who create access barriers within the disciplines in which they work, in order to assist them in reducing harm. The original principle this is derived from is: “Calls for the non-judgmental, non-coercive provision of services and resources to people who use drugs and the communities in which they live in order to assist them in reducing attendant harm.” Do not minimize or ignore real harm Does not attempt to minimize or ignore the real and tragic harm and danger that can be created by inaccessible experiences. The original principle this is derived from is: “Does not attempt to minimize or ignore the real and tragic harm and danger that can be associated with illicit drug use.” Some barriers are worse than others Understands that how access barriers are created is a complex, multi-faceted phenomenon that encompasses a range of severities from life-endangering to annoying, and acknowledges that some barriers are clearly worse than others. The original principle this is derived from is: “Understands drug use as a complex, multi-faceted phenomenon that encompasses a continuum of behaviors from severe use to total abstinence, and acknowledges that some ways of using drugs are clearly safer than others.” Social inequalities affect vulnerability Recognizes that the realities of poverty, class, racism, social isolation, past trauma, sex-based discrimination, and other social inequalities affect both people’s vulnerability to, and capacity for effectively dealing with creating inaccessible experiences. The original principle this is derived from is: “Recognizes that the realities of poverty, class, racism, social isolation, past trauma, sex-based discrimination, and other social inequalities affect both people’s vulnerability to and capacity for effectively dealing with drug-related harm.” Improvement of quality is success Establishes quality of individual and team life and well-being—not necessarily cessation of all current workflows—as the criteria for successful interventions and policies. The original principle this is derived from is: “Establishes quality of individual and community life and well-being—not necessarily cessation of all drug use—as the criteria for successful interventions and policies.” Empowering people also helps their peers Affirms people who create access barriers themselves as the primary agents of reducing the harms of their efforts, and seeks to empower them to share information and support each other in creating and using remediation strategies that are effective for their daily workflows. The original principle this is derived from is: “Affirms people who use drugs themselves as the primary agents of reducing the harms of their drug use and seeks to empower people who use drugs to share information and support each other in strategies which meet their actual conditions of use.” Ensure that disabled people have a voice in change Ensures that people who are affected by access barriers, and those who have been affected by your organization’s access barriers, have a real voice in the creation of features and services designed to serve them. The original principle this is derived from is: “Ensures that people who use drugs and those with a history of drug use routinely have a real voice in the creation of programs and policies designed to serve them.” Reframe My talk digs deeper into into the parallels between the adapted and original principles, as well as the similarities between digital accessibility and harm reduction work. This is in the service of attempting to reframe our efforts. By this, I mean that we are miscategorized participants in imperfect, trauma-generating systems. The change in perspective I am advocating for also compels changes in behavior in order to not only survive, but also flourish as digital accessibility practitioners. The adapted principles are integral to making this effort successful.

a week ago • 7 votes

Evaluating overlay-adjacent accessibility products

I get asked about my opinion on overlay-adjacent accessibility products with enough frequency that I thought it could be helpful to write about it. There’s a category of third party products out there that are almost, but not quite an accessibility overlay. By this I mean that they seem a little less predatory, and a little more grounded in terms of the promises they make. Some of these products are widgets. Some are browser extensions. Some are apps. Some are an odd fourth thing. Sometimes it’s a case of a solutioneering disability dongle grift, sometimes its a case of good intentions executed in a less-than-optimal way, and sometimes it’s something legitimately helpful. Oftentimes it’s something that lies in the middle area of all of this. Many of them also have some sort of “AI” integration, which is the unfortunate upsell du jour we have to collectively endure for the time being. The rubric I use to evaluate these products remains very similar to how I scrutinize overlays. Hopefully it’s something that can be helpful for your own efforts. Should the product’s functionality be patented? I’m not very happy with the idea that the mechanism to operate something in an accessible way is inhibited by way of legal restriction. This artificially limits who can use it, which is in opposition to the overall mission of digital accessibility. Ideally the technology is the free bit, and the service that facilitates it is what generates the profit. Do I need to subscribe to use it? A subscription-based model is a great way to run a business, but you don’t need to pay a recurring fee to use an accessible website. The nature of the web’s technology means it can be operated via keyboard, voice control, and other assistive technology if constructed properly. Workarounds and community support also exist for some things where it’s not built well. Here I’d also like you to consider the disability tax, and how that factors into a rental model. It’s not great. Does the browser or operating system already have this functionality? A lot of the time this boils down to an issue of discovery, digital literacy, or identity. As touched on in the previous section, browsers and operating systems offer a lot to help you self-serve. Notable examples are reading mode, on-screen narration, color filters, interface and text zoom, and forced color inversion. Can it be used across multiple experiences, or just one website? Stability and predictability of operation and output are vital for technology like this. It’s why I am so bullish on utilizing existing browser and operating system features. Products built to “enhance” the accessibility of a single website or app can’t contribute towards this. Ironically, their presence may actually contribute friction towards someone’s existing method of using things. A tricky little twist here is products that target a single website are often advertised towards the website owner, and not the people who will be using said website. Can I use the keyboard to operate it? I’ve gotten in the habit of pressing Tab a few times when I first check out the product’s website and see if anything happens. It’s a quick and easy test to see if the company walks the walk in addition to talking the talk. Here, I regrettably encounter missing focus indicators and non-semantic interactive controls more often than not. I might also sometimes run the homepage through axe DevTools, to see if there are other egregious errors. I then try to use the product itself with a keyboard if a demo is offered. I am usually found wanting here. How reliable is the AI? There are two broad considerations here: How reliable is the output? How can bias affect someone’s interpretation of things? While I am a skeptic, I can also acknowledge that there are some good use cases for LLMs and related technology when it comes to disability. I think about reliability in terms of the output in terms of the “assistive” part of assistive technology. By this, I mean it actually helps you do what you need to get done. Here, I’d point to Salma Alam-Naylor’s experience with newer startups in this space versus established, community supported solutions. Then consider LLM-based image description products. Here we want to make sure the content is accurate and relevant. Remember that image descriptions are the mechanism that some people rely on to help them understand the world. If that description is not accurate, it impacts how they form an understanding of their environment. A step past that thought is the biases inherent in, and perpetuated by LLM-based technology. I recall Ben Myers’ thoughts on implicit, hegemonic normalization, as well as the sobering truth that this technology can exert influence over its users worldview at scale. Can the company be trusted with your data? A lot of assistive technology is purposely designed to not announce the fact that it is being used. This is to stave off things like discrimination or ineffective, separate-yet-equal “accessibility only” sites. There’s also the murky world of data brokerage, and if the company is selling off this information or not. AccessiBe comes to mind here, and not in a good way. Also consider if the product has access to everything you visit and interact with, and who has access to that information. As a companion concern, it is also worth considering the product’s data security practices—or lack thereof. Here, I would like to point out that startups tend to deprioritize this boring kind of infrastructure work in favor of feature creation. Not having any personal information present in a system is the best way to guard against its theft. Also know that there is no way to undo a data breach once it occurs. Leaked information stays leaked. Will the company last? Speaking of startups, know that more fail than succeed. Are you prepared for an outcome where the product you rely on is is no longer updated or supported because the company that made it went out of business? It could also be a case where the company still exists, but ceases to support the product you use. Here, know that sometimes these companies will actively squash attempts for community-based resurrection and support of the service because it represents potential liability. This concern is another reason why I’m bullish on operating system and browser functionality. They have a lot more resiliency and focus on the long view in this particular area. But also I’m not the arbiter of who can use what. In the spirit of “the best camera is the one you have on you:” if something works for your specific access needs, by all means use it.

4 weeks ago • 19 votes

GitHub’s updated Commits page and the interactive list component

GitHub has updated the page template used to list Commits on a repository. Central to this experience is an interactive list component that I was responsible for architecting. This work was done alongside input from James Scholes, whose guidance was instrumental to the effort’s success. An interactive list is a construct that’s more commonplace on desktop applications than the web. That does not mean its approach is forbidden from being used for web experiences, however. What concerns does an interactive list address? The main concern an interactive list addresses is when each discrete item in a series contains multiple interactive child elements. Navigating through every child interactive element placed with each parent list item can be a tedious enough chore that it makes the effort a non-starter. For example, if the list has ten items and each item has seven interactive child elements, that means it takes up to seventy Tab keypresses someone needs to perform to get what they need. That’s an exhausting experience to endure. It could also be agonizing. Think motor control disabilities, where individual movements in aggregate can exceed someone’s pain tolerance threshold. Making each list item’s container itself focusable and traversable addresses this problem, as it lowers the number of keypresses someone needs to use. It also supports allowing you to quickly jump to the start or end of the list for even more navigation options. On GitHub, navigating an interactive list via your keyboard can be accomplished by pressing: Tab: Places focus on the interactive list item that last received focus. Defaults to the first item in the list if the list was previously not interacted with. Down: Moves focus to the next list item, if present. Up: Moves focus to the previous list item, if present. End: Moves focus to the last list item in the interactive list. Home: Moves focus to the first list item in the interactive list. There’s a trick here: We want to make sure each list item’s announcement contains enough information that someone can make an informed choice when navigating via a screen reader. We also do not want to make the announcement so verbose that it slows down the navigation process. For example, we only include the commit title when navigating via list item on the Commits page. For an Issue, we use: The Issue title, Its status, and Its author (there is currently a bug here, we’re working on fixing it). There is an intentionality behind the order of content in this announcement, as we want to include the most pertinent information first. This, in turn, helps people navigating by list item announcement make more informed choices faster. This lets us know: What the problem is, Has it been dealt with yet, and Who found the problem? We also use the term “More information available below” to signal that someone can explore the list item’s child content in more detail. This is accomplished via pressing: Tab: Navigates forwards through each child interactive element in sequence. Shift + Tab: Navigates backwards through each child interactive element in sequence. Esc: Moves focus out of the child interactive elements and places it back on the parent list item itself. Examples of child content that someone could encounter are an Issues’ author, its labels, linked Pull Requests, comment tally, and assignees. Problems The use of the phrase “More information available below” does not sit well with me, despite being the person who oversaw its inclusion. There’s a couple of reasons here: First, I’m normally loathe to hardcode interaction hints for screen readers. The interactive list component is a bit of an exception to that rule. It is an uncommon interaction pattern on the web, so the hint needs to be included until efforts to formalize it both: Manifest, and Get widespread support from assistive technology vendors. Without these two things, I fear that blind and low vision individuals will not be able to fully utilize the experience the same way their peers can. Second, the hint phrasing itself isn’t that great. The location-based term “below” is shorthand to try and communicate that there’s subsequent child content that is related to the list item’s main content. While “subsequent child content that is related to the list item’s main content” is more descriptive, it’s an earful. I am very much open to suggestions for a replacement phrase. And this potential for change sets up other things that weigh on me. Bigger problems Using this interactive list component on the Commits page template means there are now two main areas on GitHub where the component is present. The second being the lists of repository Issues for logged-in accounts. Large, structural changes to a design’s underlying semantics disrupts the mental model and muscle memory of how many people who use screen readers operate an experience. It’s an act that I’m always nervous about undertaking. The calculated bet here is that the prominence of the components on these high-traffic areas means that understanding how to operate them becomes easier over time. I’ve also hedged that bet by including alternate ways of navigating the interactive list, including baking headings into each Commit and Issue title. HeadingsMap. I do think that this update to each page’s semantic structure is net better than what came before it. However, it is still going to manifest as a large and sudden change for people who use screen readers. And for the record, I view changing the “More information available below” phrasing as another large and disruptive change. Subsequent large and sudden changes is what I want to avoid at all costs. That said, we’re running out the clock on a situation where an interactive list will someday contain non-interactive content. The component’s current approach does not have a great way for people to be aware of, and subsequently read that kind of content. That’s not great. Because of this inevitability, I would like to replace the list’s interaction approach with the one we’re using for nested/sub-Issues. There are a few reasons for this, but the main ones are: Improving consistency and uniformity of interaction across all of GitHub for this kind of clustering of content. Leaning on more well-known interaction techniques for secondary content within an item by using dialogs instead of Tab keypresses. Providing a mechanism that can more easily handle exploring non-interactive content being placed within a list item. Making these changes would mean a drastic update on top of another drastic update. While I do think it would be a better overall experience, rolling it out would require a lot of careful effort and planning. Even bigger problems In many ways, GitHub is a battleship. It is slow to turn just by virtue of the sheer size and scale of concerns it needs to cover. Enacting my goal of replacing and unifying these kinds of interactions would take time: It would mean petitioning for heavy investment in something that may be perceived as an already “solved” problem. It also would require collaboration across multiple siloed product areas, each with their own pre-existing and planned objectives and priorities. I have the gift of hindsight in writing this. The interactive list was originally intended to address just the list of repository Issues. Its usage has since has grown to cover more use cases—not all of them actually applicable. This is one of the existential problems of a design system. You can write all the documentation you want, but people are ultimately going to use what they’re going to use regardless of if its appropriate or not. Replacing or excising misapplied components is another effort that runs counter to organization priorities. That truth lives hand-in-hand with the need to maintain the overall state of usability for everyone who uses the service. You’re gonna carry that weight Making dramatic changes to core parts of GitHub’s assistive technology user experience, followed by more dramatic changes, then potentially followed by even more dramatic changes is an outcome we’re potentially facing. It is the nature of software—especially websites and web apps—to change. That said, I worry about the overall churn this all could represent. I feel the weight of that responsibility as the person who set this course. I also feel the consequent pressure it exerts. I’ll continue to write about and plead the case internally. However, I worry that I’ve blown my one chance to get things right. I know my colleagues who produce visual designs also may feel this way, but I also think it’s a more acute problem for digital accessibility. I also don’t think that this sort of situation is one that’s talked about that often in accessibility spaces, hence me writing about it. This is to say nothing about quantifying it, either. Centering I’m pretty proud of what we accomplished, but those feelings are moot if all this effort does not serve the people it was intended to. It’s also not about me. Our efforts to be more inclusive may ironically work against us here. How much churn is the point where it’s too much and people are pushed away? To that point, feedback helps. Constructive reports on access barriers and friction are something that can bypass the internal perception of the things I’ve outlined as being seen as non-problems. I am twice heartened when I see reports. First, it is a signal that means someone is still present and cares. Second, there has been renewed internal interest in investing in acting on these user-reported accessibility problems. The work never stops This post is about interactive lists on GitHub, and how to use them. It’s also about: The responsibilities, pressures, and politics of creating complex components like the interactive list and ensuring they are accessible, How these types of components affect the larger, holistic experience of GitHub as a whole, The need to ensure these components actually work for the people they serve, and The value of providing feedback if they don’t. These are powerful things to internalize if you also do this sort of work, but also valuable to keep in mind if you don’t. The have served me well in my journey at GitHub, and I hope they help to serve you too.

2 months ago • 21 votes

Don’t forget to localize your icons

Former United States president and war criminal George W. Bush gave a speech in Australia, directing a v-for-victory hand gesture at the assembled crowd. It wasn’t received the way he intended. What he failed to realize is that this gesture means a lot of different things to a lot different people. In Australia, the v-for-victory gesture means the same as giving someone the middle finger in the United States. This is all to say that localization is difficult. Localizing your app, web app, or website is more than just running all your text through Google Translate and hoping for the best. Creating effective, trustworthy communication with language communities means doing the work to make sure your content meets them where they are. A big part of this is learning about, and incorporating cultural norms into your efforts. Doing so will help you avoid committing any number of unintentional faux pas. In this best case scenario these goofs will create an awkward and potentially funny outcome: In the worst case, it will eradicate any sense of trust you’re attempting to build. Trust There is no magic number for how many mistranslated pieces of content flips the switch from tolerant bemusement to mistrust and anger. Each person running into these mistakes has a different tolerance threshold. Additionally, that threshold is also variable depending on factors such as level of stress, seriousness of the task at hand, prior interactions, etc. If you’re operating a business, loss of trust may mean less sales. Loss of trust may have far more serious ramifications if it’s a government service. Let’s also not forget that it is language communities and not individuals. Word-of-mouth does a lot of heavy lifting here, especially for underserved and historically discriminated-against populations. To that point, reputational harm is also a thing you need to contend with. Because of this, we need to remember all the things that are frequently left out of translation and localization efforts. For this post, I’d like to focus on icons. Iconic We tend to think of icons as immutable glyphs whose metaphors convey platonic functionality and purpose. A little box with an abstract mountain and a rising sun? I bet that lets you insert a picture. And how about a right-facing triangle? Five dollars says it plays something. However, these metaphors start to fall apart when not handled with care and discretion. If your imagery is too abstract it might not read the way it is intended to, especially for more obscure or niche functionality. Transit. Similarly, objects or concepts that don’t exist in the demographics you are serving won’t directly translate well. It will take work, but the results can be amazing. An exellent example of accommodation is Firefox OS’ localization efforts with the Fula people. Culture impacts how icons are interpreted, understood, and used, just like all other content. Here, I’d specifically like to call attention to three commonly-found icons whose meanings can be vasty different depending on the person using them. I would also like to highlight something that all three of these icons have in common: they use hand gestures to represent functionality. This makes a lot of sense! Us humans have been using our hands to communicate things for about as long as humanity itself has existed. It’s natural to take this communication and apply it to a digital medium. That said, we also need to acknowledge that due to their widespread use that these gestures—and therefore the icons that use them—can be interpreted differently by cultures and language communities that are different than the one who added the icons to the experience. The three icons themselves are thumb’s up, thumb’s down, and the okay hand symbol. Let’s unpack them: Thumb’s up What it’s intended to be used for This icon usually means expressing favor for something. It is typically also a tally, used as a signal for how popular the content is with an audience. Facebook did a lot of heavy lifting here with its Like button. In the same breath I’d also like to say that Facebook is a great example of how ignoring culture when serving a global audience can lead to disastrous outcomes. Who could be insulted by it In addition to expressing favor or approval, a thumb’s up can also be insulting in cultures originating from the following regions (not a comprehensive list): Bangladesh, Some parts of West Africa, Iran, Iraq, Afganistan, Some parts of Russia, Some parts of Latin America, and Australia, if you also waggle it up and down. It was also not a great gesture to be on the receiving end of in Rome, specifically if you were a downed gladiator at the mercy of the crowd. What you could use instead If it’s a binary “I like this/I don’t like this” choice, consider symbols like stars and hearts. Sparkles are out, because AI has ruined them. I’m also quite partial to just naming the action—after all the best icon is a text label. Thumb’s down What it’s intended to be used for This icon is commonly paired with a thumb’s up as part of a tally-based rating system. People can express their dislike of the content, which in turn can signal if the content failed to find a welcome reception. Who could be insulted by it A thumb’s down has a near-universal negative connotation, even in cultures where its use is intentional. It is also straight-up insulting in Japan. It may also have gang-related connotations. I’m hesitant to comment on that given how prevalent misinformation is about that sort of thing, but it’s also a good reminder of how symbolism can be adapted in ways we may not initially consider outside of “traditional” channels. Like the thumb’s up gesture, this is also not a comprehensive list. I’m a designer, not an ethnographic researcher. What you could use instead Consider removing outrage-based metrics. They’re easy to abuse and subvert, exploitative, and not psychologically healthy. If you well and truly need that quant data consider going with a rating scale instead of a combination of thumb’s up and thumb’s down icons. You might also want to consider ditching rating all together if you want people to actually read your content, or if you want to encourage more diversity of expression. Okay What it’s intended to be used for This symbol is usually used to represent acceptance or approval. Who could be insulted by it People from Greece may take offense to an okay hand symbol. The gesture might have also offended people in France and Spain when performed by hand, but that may have passed. Who could be threatened by it The okay hand sign has also been subverted by 4chan and co-opted by the White supremacy movement. An okay hand sign’s presence could be read as a threat by a population who is targeted by White supremacist hate. Here, it could be someone using it without knowing. It could also be a dogwhistle put in place by either a bad actor within an organization, or the entire organization itself. Thanks to the problem of other minds, the person on the receiving end cannot be sure about the underlying intent. Because of this, the safest option is to just up and leave. What you could use instead Terms like “I understand”, “I accept”, and “acknowledged” all work well here. I’d also be wary of using checkmarks, in that their meaning also isn’t a guarantee. So, what symbols can I use? There is no one true answer here, only degrees of certainty. Knowing what ideas, terms, and images are understood, accepted by, or offend a culture requires doing research. There is also the fact that the interpretation of these symbols can change over time. For this fact, I’d like to point out that pejorative imagery can sometimes become accepted due to constant, unending mass exposure. We won’t go back to using a Swastika to indicate good luck any time soon. However, the homogenization effect of the web’s implicit Western bias means that things like thumb’s up icons everywhere is just something people begrudgingly get used to. This doesn’t mean that we have to capitulate, however! Adapting your iconography to meet a language culture where it’s at can go a long way to demonstrating deep care. Just be sure that the rest of your localization efforts match the care you put into your icons and images. Otherwise it will leave the experience feeling off. An example of this is using imagery that feels natural in the language culture you’re serving, but having awkward and stilted text content. This disharmonious mismatch in tone will be noticed and felt, even if it isn’t concretely tied to any one thing. Different things mean different things in different ways Effective, clear communication that is interpreted as intended is a complicated thing to do. This gets even more intricate when factors like language, culture, and community enter the mix. Taking the time to do research, and also perform outreach to the communities you wish to communicate with can take a lot of work. But doing so will lead to better experiences, and therefore outcomes for all involved. Take stock of the images and icons you use as you undertake, or revisit your localization efforts. There may be more to it than you initially thought.

4 months ago • 33 votes

More in programming

ChatGPT Would be a Decent Policy Advisor

Revealed: How the UK tech secretary uses ChatGPT for policy advice by Chris Stokel-Walker for the New Scientist

16 hours ago • 3 votes

Setting policy for strategy.

This book’s introduction started by defining strategy as “making decisions.” Then we dug into exploration, diagnosis, and refinement: three chapters where you could argue that we didn’t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the following chapter on operations, we finally start to actually make some decisions. In this chapter, we’ll dig into: How we define policy, and how setting policy differs from operating policy as discussed in the next chapter The structured steps for setting policy How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way? Recurring kinds of policies that appear frequently in strategies Why it’s valuable to be intentional about your strategy’s altitude, and how engineers and executives generally maintain different altitudes in their strategies Criteria to use for evaluating whether your policies are likely to be impactful How to develop novel policies, and why it’s rare Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis How policies that ignore constraints sound inspirational, but accomplish little Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders By the end, you’ll be ready to evaluate why an existing strategy’s policies are struggling to make an impact, and to start iterating on policies for strategy of your own. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. What is policy? Policy is interpreting your diagnosis into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They’ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization. An effective policy solves the entirety of the strategy’s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the strategy for working with private equity ownership acknowledges in its diagnosis that they don’t have clear guidance on what kind of reduction to expect: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available: We believe our new ownership will provide a specific target for Research and Development (R&D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes. There are two frequent points of confusion when creating policies that are worth addressing directly: Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy’s diagnosis. For example, the “N-1 backfill policy” makes sense in the context of new, private equity ownership. The policy wouldn’t work well in a rapidly expanding organization. Any strategy without a policy is useless, but you’ll also find policies without context aren’t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections. Policy describes how tradeoffs should be made, but it doesn’t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization’s behavior to ensure policies are followed. When reworking a strategy to be more readable, it often makes sense to merge policy and operation sections together. However, when drafting strategy it’s valuable to keep them separate. Yes, you might use a weekly meeting to review whether the policy is being followed, but whether it’s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement. With this definition in mind, now we can move onto the more interesting discussion of how to set policy. How to set policy Every part of writing a strategy feels hard when you’re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It’s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow. The steps I follow to write a strategy’s policy are: Review diagnosis to ensure it captures the most important themes. It doesn’t need to be perfect, but it shouldn’t have omissions so obvious that you can immediately identify them. Select policies that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered. This is a broad instruction, but it’s simpler than it sounds because you’ll typically select from policies identified during your exploration phase. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances. If you do find yourself developing a novel policy, there’s a later section in this chapter, Developing novel policies, that addresses that topic in more detail. Consolidate policies in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization. Backtest policy against recent decisions you’ve made. This is particularly effective if you maintain a decision log in your organization. Mine for conflict once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don’t wholly eliminate those that you agree with. Just as it’s easy to crowd out opposing views in diagnosis if you don’t solicit their input, it’s possible to accidentally crowd out your own perspective if you anchor too much on others’ perspectives. Consider refinement if you finish writing, and you just aren’t sure your approach works – that’s fine! Return to the refinement phase by deploying one of the refinement techniques to increase your conviction. Remember that we talk about strategy like it’s done in one pass, but almost all real strategy takes many refinement passes. The steps of writing policy are relatively pedestrian, largely because you’ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you’d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower. How many policies? Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The strategy for decomposing a monolithic application is not one policy deciding not to decompose, but a series of four policies: Business units should always operate in their own code repository and monolith. New integrations across business unit monoliths should be done using gRPC. Except for new business unit monoliths, we don’t allow new services. Merge existing services into business-unit monoliths where you can. Four isn’t universally the right number either. It’s simply the number that was required to solve that strategy’s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That’s great: what makes a policy good is that it’s effective, not that it’s novel or inspiring. Kinds of policies While there are so many policies you can write, I’ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories. Approvals define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive. In the Index post-acquisition integration strategy, there were a number of complex decisions to be made, and the approval mechanism was: Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering. This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the user data access strategy’s approval strategy was more focused on managing corporate risk: Exceptions must be granted in writing by CISO. While our overarching Engineering Strategy states that we follow an advisory architecture process as described in Facilitating Software Architecture, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel. These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust. Allocations describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization’s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company’s beliefs. The strategy on Uber’s service migration has two concrete examples of allocation policies. The first describes the Infrastructure engineering team’s allocation between manual provision tasks and investing into creating a self-service provisioning platform: Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers. The second allocation policy is implicitly noted in this strategy’s diagnosis, where it describes the allocation policy in the Engineering organization’s higher altitude strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy. Direction provides explicit instruction on how a decision must be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment. Direction works well when you need an unambiguous policy that doesn’t leave room for interpretation. For example, Calm’s policy for working in the monolith: We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith. In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below. In that case, the team couldn’t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation. Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in strategy on working with new Private Equity ownership: We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit. It’s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful. Guidance provides a recommendation about how a decision should be made. Guidance is useful when there’s enough nuance, ambiguity, or complexity that you can explain the desired destination, but you can’t mandate the path to reaching it. One example of guidance comes from the Index acquisition integration strategy: Minimize changes to tokenization environment: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored. However, any other functionality must not be added to our tokenization environment. This might read like direction, but it’s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it’s not able to articulate what complexity is necessary, so ultimately it’s guidance because it requires significant judgment to interpret. A second example of guidance comes in the strategy on decomposing a monolithic codebase: Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context. This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it’s possible to merge a given service back into a monolithic codebase, they’ll probably disagree. That’s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion. When you’re working on a strategy’s policy section, it’s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they’re all usable: If you’re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform. If you’re an executive, you might lean heavily on direction. Indeed, you might lean too heavily on direction, where guidance often works better for areas where you understand the direction but not the path. If you’re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics. Finally, if you have a clear approach you want to take that doesn’t fit cleanly into any of these categories, then don’t let this framework dissuade you. Give it a try, and adapt if it doesn’t initially work out. Maintaining strategy altitude The chapter on when to write engineering strategy introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization. Without repeating that section in its entirety, it’s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies: Stripe’s Sorbet strategy only worked in an organization that enforced the use of a single programming language across (essentially) all teams Uber’s service migration strategy worked well in an organization that was unwilling to enforce consistent programming language adoption across teams Stripe’s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams’ leverage. Both altitudes make sense. Both have consequences. Criteria for effective policies In The Engineering Executive’s Primer’s chapter on engineering strategy, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit: Applicable: it can be used to navigate complex, real scenarios, particularly when making tradeoffs. Enforced: teams will be held accountable for following the guiding policy. Create Leverage: create compounding or multiplicative impact. The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn’t make as much sense here. Some policies certainly should create leverage (e.g. empower developer experience team by restricting new services), but others might not (e.g. moving to an N-1 backfill policy). Outside the executive context, what’s important isn’t necessarily creating leverage, but that a policy solves for part of the diagnosis. That leaves the other two–being applicable and enforced–both of which are necessary for a policy to actually address the diagnosis. Any policy which you can’t determine how to apply, or aren’t willing to enforce, simply won’t be useful. Let’s apply these criteria to a handful of potential policies. First let’s think about policies we might write to improve the talent density of our engineering team: “We only hire world-class engineers.” This isn’t applicable, because it’s unclear what a world-class engineer means. Because there’s no mutually agreeable definition in this policy, it’s also not consistently enforceable. “We only hire engineers that get at least one ‘strong yes’ in scorecards.” This is applicable, because there’s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don’t happen to get a strong yes. Next, let’s think about a policy regarding code reuse within a codebase: “We follow a strict Don’t Repeat Yourself policy in our codebase.” There’s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there’s room for debate, it’s a very context specific determination to decide how to enforce a decision. “Code authors are responsible for determining if their contributions violate Don’t Repeat Yourself, and rewriting them if they do.” This is much more applicable, because now there’s only a single person’s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there’s no longer any ambiguity around who is deciding whether a piece of code is a repetition. The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that’s unwilling to distinguish between good and bad judgment won’t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another. If you ever find yourself wanting to include a policy that for some reason either can’t be applied or can’t be enforced, stop to ask yourself what you’re trying to accomplish and ponder if there’s a different policy that might be better suited to that goal. Developing novel policies My experience is that there are vanishingly few truly novel policies to write. There’s almost always someone else has already done something similar to your intended approach. Calm’s engineering strategy is such a case: the details are particular to the company, but the general approach is common across the industry. The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in the strategy for adopting large-language models, the new technology can be engaged with as a generic technology: Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets. Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers. You could simply replace “LLM” with “data-driven” and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don’t know much about. However, even if your policy isn’t novel to the industry, it might still be novel to you or your organization. The steps that I’ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement: Collect a number of similar policies, with a focus on how those policies differ from the policy you are creating Create a systems model to articulate how this policy will work, and also how it will differ from the similar policies you’re considering Run a strategy testing cycle for your proto-policy to discover any unknown-unknowns about how it works in practice Whether you run into this scenario is largely a function of the extent of your, and your organization’s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances. Are competing policy proposals an anti-pattern? When creating policy, you’ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I’d encourage you to think of this as exposing gaps in your diagnosis. For example, when Stripe developed the Sorbet ruby-typing tooling, there was debate between two policies: Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase? Should we migrate the codebase to a preexisting strongly typed language like Golang or Java? These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall. If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than including them in the core of your strategy document. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy. Recognizing constraints A similar problem to competing solutions is developing a policy that you cannot possibly fund. It’s easy to get enamored with policies that you can’t meaningfully enforce, but that’s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it. To consider a few examples: The strategy for controlling access to user data might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere. Our approach to Uber’s service migration might have required more staffing for the infrastructure engineering team, but we knew that wasn’t going to happen, so it was a meaningless policy proposal to make. The strategy for navigating private equity ownership might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period. If you find a policy that contemplates an impractical approach, it doesn’t only indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward. In cases where aligning on the diagnosis isn’t possible, for example because you simply don’t understand the possibilities of a new technology as encountered in the strategy for adopting LLMs, then you’ve typically found a valuable opportunity to use strategy refinement to build alignment. Dealing with missing strategies At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing. They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the diagnosis chapter about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward. Sometimes, even when you do this, it’s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you’re an executive or an engineer, you’ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable. Summary After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there’s only one phase of strategy left to dig into, operating the policies you’ve created.

21 hours ago • 3 votes

Fast and random sampling in SQLite

I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]

12 hours ago • 1 votes

Choosing Languages

yesterday • 3 votes

05 · Syncing Keyhive

How we sync Keyhive and Automerge

yesterday • 1 votes

New here?