Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
27
It’s been two years since I wrote #define CTO, in which I documented my quest for a role where I could have scalable impact by writing code. I’ve finally found that role, though not by seeking it — instead, I sought out a problem more important to me than my role within it, brought together the right people, and found that I can best make them effective by writing code. Formation # In August 2015, OpenAI was just an idea articulated over a dinner with Elon Musk, Sam Altman, Ilya Sutskever, me, and a handful of others. We’d each come to the dinner with our own ideas, but Elon and Sam had a crisp vision of building safe AI in a project dedicated to benefiting humanity. I wanted to contribute however I could. Sam and I started rallying a team to turn this idea into reality. We were missing a core ingredient: we needed an AI technical visionary, someone whose intuition and ideas we could follow to the breakthroughs. Ilya Sutskever was clearly the best person in the world for this. Ilya is...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Greg Brockman

It's time to become an ML engineer

AI has recently crossed a utility threshold, where cutting-edge models such as GPT-3, Codex, and DALL-E 2 are actually useful and can perform tasks computers cannot do any other way. The act of producing these models is an exploration of a new frontier, with the discovery of unknown capabilities, scientific progress, and incredible product applications as the rewards. And perhaps most exciting for me personally, because the field is fundamentally about creating and studying software systems, great engineers are able to contribute at the same level as great researchers to future progress. “A self-learning AI system.” by DALL-E 2. I first got into software engineering because I wanted to build large-scale systems that could have a direct impact on people’s lives. I attended a math research summer program shortly after I started programming, and my favorite result of the summer was a scheduling app I built for people to book time with the professor. Specifying every detail of how a program should work is hard, and I’d always dreamed of one day putting my effort into hypothetical AI systems that could figure out the details for me. But after taking one look at the state of the art in AI in 2008, I knew it wasn’t going to work any time soon and instead started building infrastructure and product for web startups. DALL-E 2’s rendition of “The two great pillars of the house of artificial intelligence” (which according to my co-founder Ilya Sutskever are great engineering, and great science using this engineering) It’s now almost 15 years later, and the vision of systems which can learn their own solutions to problems is becoming incrementally more real. And perhaps most exciting is the underlying mechanism by which it’s advancing — at OpenAI, and the field generally, precision execution on large-scale models is a force multiplier on AI progress, and we need more people with strong software skills who can deliver these systems. This is because we are building AI models out of unprecedented amounts of compute; these models in turn have unprecedented capabilities, we can discover new phenomena and explore the limits of what these models can and cannot do, and then we use all these learnings to build the next model. “Harnessing the most compute in the known universe” by DALL-E 2 Harnessing this compute requires deep software skills and the right kind of machine learning knowledge. We need to coordinate lots of computers, build software frameworks that allow for hyperoptimization in some cases and flexibility in others, serve these models to customers really fast (which is what I worked on in 2020), and make it possible for a small team to manage a massive system (which is what I work on now). Engineers with no ML background can contribute from the day they join, and the more ML they pick up the more impact they have. The OpenAI environment makes it relatively easy to absorb the ML skills, and indeed, many of OpenAI’s best engineers transferred from other fields. All that being said, AI is not for every software engineer. I’ve seen about a 50-50 success rate of engineers entering this field. The most important determiner is a specific flavor of technical humility. Many dearly-held intuitions from other domains will not apply to ML. The engineers who make the leap successfully are happy to be wrong (since it means they learned something), aren’t afraid not to know something, and don’t push solutions that others resist until they’ve gathered enough intuition to know for sure that it matches the domain. “A beaver who has humbly recently become a machine learning engineer” by DALL-E 2 I believe that AI research is today by far the most impactful place for engineers who want to build useful systems to be working, and I expect this statement to become only more true as progress continues. If you’d like to work on creating the next generation of AI models, email me (gdb@openai.com) with any evidence of exceptional accomplishment in software engineering.

over a year ago 30 votes
How I became a machine learning practitioner

For the first three years of OpenAI, I dreamed of becoming a machine learning expert but made little progress towards that goal. Over the past nine months, I’ve finally made the transition to being a machine learning practitioner. It was hard but not impossible, and I think most people who are good programmers and know (or are willing to learn) the math can do it too. There are many online courses to self-study the technical side, and what turned out to be my biggest blocker was a mental barrier — getting ok with being a beginner again. Studying machine learning during the 2018 holiday season. Early days # A founding principle of OpenAI is that we value research and engineering equally — our goal is to build working systems that solve previously impossible tasks, so we need both. (In fact, our team is comprised of 25% people primarily using software skills, 25% primarily using machine learning skills, and 50% doing a hybrid of the two.) So from day one of OpenAI, my software skills were always in demand, and I kept procrastinating on picking up the machine learning skills I wanted. After helping build OpenAI Gym, I was called to work on Universe. And as Universe was winding down, we decided to start working on Dota — and we needed someone to turn the game into a reinforcement learning environment before any machine learning could begin. Dota # Turning such a complex game into a research environment without source code access was awesome work, and the team’s excitement every time I overcame a new obstacle was deeply validating. I figured out how to break out of the game’s Lua sandbox, LD_PRELOAD in a Go GRPC server to programmatically control the game, incrementally dump the whole game state into a Protobuf, and build a Python library and abstractions with future compatibility for the many different multiagent configurations we might want to use. But I felt half blind. At Stripe, though I gravitated towards infrastructure solutions, I could make changes anywhere in the stack since I knew the product code intimately. In Dota, I was constrained to looking at all problems through a software lens, which sometimes meant I tried to solve hard problems that could be avoided by just doing the machine learning slightly differently. I wanted to be like my teammates Jakub Pachocki and Szymon Sidor, who had made the core breakthrough that powered our Dota bot. They had questioned the common wisdom within OpenAI that reinforcement algorithms didn’t scale. They wrote a distributed reinforcement learning framework called Rapid and scaled it exponentially every two weeks or so, and we never hit a wall with it. I wanted to be able to make critical contributions like that which combined software and machine learning skills. Szymon on the left; Jakub on the right. In July 2017, it looked like I might have my chance. The software infrastructure was stable, and I began work on a machine learning project. My goal was to use behavioral cloning to teach a neural network from human training data. But I wasn’t quite prepared for just how much I would feel like a beginner. I kept being frustrated by small workflow details which made me uncertain if I was making progress, such as not being certain which code a given experiment had used or realizing I needed to compare against a result from last week that I hadn’t properly archived. To make things worse, I kept discovering small bugs that had been corrupting my results the whole time. I didn’t feel confident in my work, but to make it worse, other people did. People would mention how how hard behavioral cloning from human data is. I always made sure to correct them by pointing out that I was a newbie, and this probably said more about my abilities than the problem. It all briefly felt worth it when my code made it into the bot, as Jie Tang used it as the starting point for creep blocking which he then fine-tuned with reinforcement learning. But soon Jie figured out how to get better results without using my code, and I had nothing to show for my efforts. I never tried machine learning on the Dota project again. Time out # After we lost two games in The International in 2018, most observers thought we’d topped out what our approach could do. But we knew from our metrics that we were right on the edge of success and mostly needed more training. This meant the demands on my time had relented, and in November 2018, I felt I had an opening to take a gamble with three months of my time. Team members in high spirits after losing our first game at The International. I learn best when I have something specific in mind to build. I decided to try building a chatbot. I started self-studying the curriculum we developed for our Fellows program, selecting only the NLP-relevant modules. For example, I wrote and trained an LSTM language model and then a Transformer-based one. I also read up on topics like information theory and read many papers, poring over each line until I fully absorbed it. It was slow going, but this time I expected it. I didn’t experience flow state. I was reminded of how I’d felt when I just started programming, and I kept thinking of how many years it had taken to achieve a feeling of mastery. I honestly wasn’t confident that I would ever become good at machine learning. But I kept pushing because… well, honestly because I didn’t want to be constrained to only understanding one part of my projects. I wanted to see the whole picture clearly. My personal life was also an important factor in keeping me going. I’d begun a relationship with someone who made me feel it was ok if I failed. I spent our first holiday season together beating my head against the machine learning wall, but she was there with me no matter how many planned activities it meant skipping. One important conceptual step was overcoming a barrier I’d been too timid to do with Dota: make substantive changes to someone else’s machine learning code. I fine-tuned GPT-1 on chat datasets I’d found, and made a small change to add my own naive sampling code. But it became so painfully slow as I tried to generate longer messages that my frustration overwhelmed my fear, and I implemented GPU caching — a change which touched the entire model. I had to try a few times, throwing out my changes as they exceeded the complexity I could hold in my head. By the time I got it working a few days later, I realized I’d learned something that I would have previously thought impossible: I now understood how the whole model was put together, down to small stylistic details like how the codebase elegantly handles TensorFlow variable scopes. Retooled # After three months of self-study, I felt ready to work on an actual project. This was also the first point where I felt I could benefit from the many experts we have at OpenAI, and I was delighted when Jakub and my co-founder Ilya Sutskever agreed to advise me. Ilya singing karaoke at our company offsite. We started to get very exciting results, and Jakub and Szymon joined the project full-time. I feel proud every time I see a commit from them in the machine learning codebase I’d started. I’m starting to feel competent, though I haven’t yet achieved mastery. I’m seeing this reflected in the number of hours I can motivate myself to spend focused on doing machine learning work — I’m now around 75% of the number of coding hours from where I’ve been historically. But for the first time, I feel that I’m on trajectory. At first, I was overwhelmed by the seemingly endless stream of new machine learning concepts. Within the first six months, I realized that I could make progress without constantly learning entirely new primitives. I still need to get more experience with many skills, such as initializing a network or setting a learning rate schedule, but now the work feels incremental rather than potentially impossible. From our Fellows and Scholars programs, I’d known that software engineers with solid fundamentals in linear algebra and probability can become machine learning engineers with just a few months of self study. But somehow I’d convinced myself that I was the exception and couldn’t learn. But I was wrong — even embedded in the middle of OpenAI, I couldn’t make the transition because I was unwilling to become a beginner again. You’re probably not an exception either. If you’d like to become a deep learning practitioner, you can. You need to give yourself the space and time to fail. If you learn from enough failures, you’ll succeed — and it’ll probably take much less time than you expect. At some point, it does become important to surround yourself by existing experts. And that is one place where I’m incredibly lucky. If you’re a great software engineer who reaches that point, keep in mind there’s a way you can be surrounded by the same people as I am — apply to OpenAI!

over a year ago 29 votes
OpenAI Five Finals Intro

The text of my speech introducing OpenAI Five at Saturday’s OpenAI Five Finals event, where our AI beat the world champions at Dota 2: “Welcome everyone. This is an exciting day. First, this is an historic moment: this will be the first time that an AI has even attempted to play the world champions in an esports game. OG is simply on another level relative to other teams we’ve played. So we don’t know what’s going to happen, but win or lose, these will be games to remember. And you know, OpenAI Five and DeepMind’s very impressive StarCraft bot This event is really about something bigger than who wins or loses: letting people connect with the strange, exotic, yet tangible intelligences produced by today’s rapidly progressing AI technology. We’re all used to computer programs which have been meticulously coded by a human programmer. Do one thing that the human didn’t anticipate, and the program will break. We think of our computers as unthinking machines which can’t innovate, can’t be creative, can’t truly understand. But to play Dota, you need to do all these things. So we needed to do something different. OpenAI Five is powered by deep reinforcement learning — meaning that we didn’t code in how to play Dota. We instead coded in the how to learn. Five tries out random actions, and learns from a reward or punishment. In its 10 months of training, its experienced 45,000 years of Dota gameplay against itself. The playstyle it has devised are its own — they are truly creative and dreamed up by our computer — and so from Five’s perspective, today’s games are going to its first encounter with an alien intelligence (no offense to OG!). The beauty of this technology is that our learning code doesn’t know it’s meant for Dota. That makes it general purpose with amazing potential to benefit our lives. Last year we used it to control a robotic hand that no one could program. And we expect to see similar technology in new interactive systems, from elderly care robots to creative assistants to other systems we can’t dream of yet. This is the final public event for OpenAI Five, but we expect to do other Dota projects in the future. I want to thank the incredible team at OpenAI, everyone who worked directly on this project or cheered us on. I want to thank those who have supported the project: Valve, dozens of test teams, today’s casters, and yes, even all the commenters on Reddit. And I want to give massive thanks today to our fantastic guests OG who have taken time out of their tournament schedule to be here today. I hope you enjoy the show — and just to keep things in perspective, no matter how surprising the AIs are to us, know that we’re even more surprising to them!”

over a year ago 28 votes
The OpenAI Mission

This post is co-written by Greg Brockman (left) and Ilya Sutskever (right). We’ve been working on OpenAI for the past three years. Our mission is to ensure that artificial general intelligence (AGI) — which we define as automated systems that outperform humans at most economically valuable work — benefits all of humanity. Today we announced a new legal structure for OpenAI, called OpenAI LP, to better pursue this mission — in particular to raise more capital as we attempt to build safe AGI and distribute its benefits. In this post, we’d like to help others understand how we think about this mission. Why now? # The founding vision of the field of AI was “… to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it”, and to eventually build a machine that thinks — that is, an AGI. But over the past 60 years, progress stalled multiple times and people started thinking of AI as a field that wouldn’t deliver. Since 2012, deep learning has generated sustained progress in many domains using a small simple set of tools, which have the following properties: Generality: deep learning tools are simple, yet they apply to many domains, such as vision, speech recognition, speech synthesis, text synthesis, image synthesis, translation, robotics, and game playing. Competence: today, the only way to get competitive results on most “AI-type problems” is through the use of deep learning techniques. Scalability: good old fashioned AI was able to produce exciting demos, but its techniques had difficulty scaling to harder problems. But in deep learning, more computational power and more data leads to better results. It has also proven easy (if costly) to rapidly increase the amount of compute productively used by deep learning experiments. The rapid progress of useful deep learning systems with these properties makes us feel that it’s reasonable to start taking AGI seriously — though it’s hard to know how far away it is. The impact of AGI # Just like a computer today, an AGI will be applicable to a wide variety of tasks — and just like computers in 1900 or the Internet in 1950, it’s hard to describe (or even predict) the kind of impact AGI will have. But to get a sense, imagine a computer system which can do the following activities with minimal human input: Make a scientific breakthrough at the level of the best scientists Productize that breakthrough and build a company, with a skill comparable to the best entrepreneurs Rapidly grow that company and manage it at large scale The upside of such a computer system is enormous — for an illustrative example, an AGI following the pattern above could produce amazing healthcare applications deployed at scale. Imagine a network of AGI-powered computerized doctors that accumulates a superhuman amount of clinical experience, allowing it to produce excellent diagnoses, deeply understand the nuanced effect of various treatments in lots of conditions, and greatly reduce the human error factor of healthcare — all for very low cost and accessible to everyone. Risks # We already live in a world with entities that surpass individual human abilities, which we call companies. If working on the right goals in the right way, companies can produce huge amounts of value and improve lives. But if not properly checked, they can also cause damage, like logging companies that cut down rain forests, cigarette companies that get children smoking, or scams like Ponzi schemes. We think of AGI as being like a hyper-effective company, with commensurate benefits and risks. We are concerned about AGI pursuing goals misspecified by its operator, malicious humans subverting a deployed AGI, or an out-of-control economy that grows without resulting in improvements to human lives. And because it’s hard to change powerful systems — just think about how hard it’s been to add security to the Internet — once they’ve been deployed, we think it’s important to address AGI’s safety and policy risks before it is created. OpenAI’s mission is to figure out how to get the benefits of AGI and mitigate the risks — and make sure those benefits accrue to all of humanity. The future is uncertain, and there are many ways in which our predictions could be incorrect. But if they turn out to be right, this mission will be critical. If you’d like to work on this mission, we’re hiring! About us # Ilya: I’ve been working on deep learning for 16 years. It was fun to witness deep learning transform from being a marginalized subfield of AI into one the most important family of scientific advances in recent history. As deep learning was getting more powerful, I realized that AGI might become a reality on a timescale relevant to my lifetime. And given AGI’s massive upside and significant risks, I want to maximize the positive parts of this impact and minimize the negative. Greg: Technology causes change, both positive and negative. AGI is the most extreme kind of technology that humans will ever create, with extreme upside and downside. I work on OpenAI because making AGI go well is the most important problem I can imagine contributing towards. Today I try to spend most of my time on technical work, and also work to spark better public discourse about AGI and related topics.

over a year ago 26 votes
OpenAI Five intro

The text of my speech introducing OpenAI Five at yesterday’s Benchmark event: “We’re here to watch humans and AI play Dota, but today’s match will have implications for the world. OpenAI’s mission is to ensure that when we can build machines as smart as humans, they will benefit all of humanity. That means both pushing the limits of what’s possible and ensuring future systems are safe and aligned with human values. We work on Dota because it is a great training ground for AI: it is one of the most complicated games, involving teamwork, real time strategy, imperfect information, and an astronomical combinations of heroes and items. We can’t program a solution, so Five learns by playing 180 years of games against itself every day — sadly that means we can’t learn from the players up here unless they played for a few decades. It’s powered by 5 artificial neural networks which act like an artificial intuition. Five’s neural networks are about the size of the brain of an ant — still far from what we all have in our heads. One year ago, we beat the world’s top professionals at 1v1 Dota. People thought 5v5 would be totally out of reach. 1v1 requires mechanics and positioning; people did not expect the same system to learn strategy. But our AI system can learn problems it was not even designed to solve — we just used the same technology to learn to control a robotic hand — something no one could program. The computational power for OpenAI Five would have been impractical two years ago. But the availability of computation for AI has been increasing exponentially, doubling every 3.5 months since 2012, and one day technologies like this will become commonplace. Feel free to root for either team. Either way, humanity wins.” I’m very excited to see where the upcoming months of OpenAI Five development and testing take us.

over a year ago 28 votes

More in programming

ChatGPT Would be a Decent Policy Advisor

Revealed: How the UK tech secretary uses ChatGPT for policy advice by Chris Stokel-Walker for the New Scientist

16 hours ago 3 votes
Setting policy for strategy.

This book’s introduction started by defining strategy as “making decisions.” Then we dug into exploration, diagnosis, and refinement: three chapters where you could argue that we didn’t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the following chapter on operations, we finally start to actually make some decisions. In this chapter, we’ll dig into: How we define policy, and how setting policy differs from operating policy as discussed in the next chapter The structured steps for setting policy How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way? Recurring kinds of policies that appear frequently in strategies Why it’s valuable to be intentional about your strategy’s altitude, and how engineers and executives generally maintain different altitudes in their strategies Criteria to use for evaluating whether your policies are likely to be impactful How to develop novel policies, and why it’s rare Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis How policies that ignore constraints sound inspirational, but accomplish little Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders By the end, you’ll be ready to evaluate why an existing strategy’s policies are struggling to make an impact, and to start iterating on policies for strategy of your own. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. What is policy? Policy is interpreting your diagnosis into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They’ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization. An effective policy solves the entirety of the strategy’s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the strategy for working with private equity ownership acknowledges in its diagnosis that they don’t have clear guidance on what kind of reduction to expect: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available: We believe our new ownership will provide a specific target for Research and Development (R&D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes. There are two frequent points of confusion when creating policies that are worth addressing directly: Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy’s diagnosis. For example, the “N-1 backfill policy” makes sense in the context of new, private equity ownership. The policy wouldn’t work well in a rapidly expanding organization. Any strategy without a policy is useless, but you’ll also find policies without context aren’t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections. Policy describes how tradeoffs should be made, but it doesn’t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization’s behavior to ensure policies are followed. When reworking a strategy to be more readable, it often makes sense to merge policy and operation sections together. However, when drafting strategy it’s valuable to keep them separate. Yes, you might use a weekly meeting to review whether the policy is being followed, but whether it’s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement. With this definition in mind, now we can move onto the more interesting discussion of how to set policy. How to set policy Every part of writing a strategy feels hard when you’re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It’s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow. The steps I follow to write a strategy’s policy are: Review diagnosis to ensure it captures the most important themes. It doesn’t need to be perfect, but it shouldn’t have omissions so obvious that you can immediately identify them. Select policies that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered. This is a broad instruction, but it’s simpler than it sounds because you’ll typically select from policies identified during your exploration phase. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances. If you do find yourself developing a novel policy, there’s a later section in this chapter, Developing novel policies, that addresses that topic in more detail. Consolidate policies in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization. Backtest policy against recent decisions you’ve made. This is particularly effective if you maintain a decision log in your organization. Mine for conflict once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don’t wholly eliminate those that you agree with. Just as it’s easy to crowd out opposing views in diagnosis if you don’t solicit their input, it’s possible to accidentally crowd out your own perspective if you anchor too much on others’ perspectives. Consider refinement if you finish writing, and you just aren’t sure your approach works – that’s fine! Return to the refinement phase by deploying one of the refinement techniques to increase your conviction. Remember that we talk about strategy like it’s done in one pass, but almost all real strategy takes many refinement passes. The steps of writing policy are relatively pedestrian, largely because you’ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you’d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower. How many policies? Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The strategy for decomposing a monolithic application is not one policy deciding not to decompose, but a series of four policies: Business units should always operate in their own code repository and monolith. New integrations across business unit monoliths should be done using gRPC. Except for new business unit monoliths, we don’t allow new services. Merge existing services into business-unit monoliths where you can. Four isn’t universally the right number either. It’s simply the number that was required to solve that strategy’s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That’s great: what makes a policy good is that it’s effective, not that it’s novel or inspiring. Kinds of policies While there are so many policies you can write, I’ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories. Approvals define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive. In the Index post-acquisition integration strategy, there were a number of complex decisions to be made, and the approval mechanism was: Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering. This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the user data access strategy’s approval strategy was more focused on managing corporate risk: Exceptions must be granted in writing by CISO. While our overarching Engineering Strategy states that we follow an advisory architecture process as described in Facilitating Software Architecture, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel. These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust. Allocations describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization’s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company’s beliefs. The strategy on Uber’s service migration has two concrete examples of allocation policies. The first describes the Infrastructure engineering team’s allocation between manual provision tasks and investing into creating a self-service provisioning platform: Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers. The second allocation policy is implicitly noted in this strategy’s diagnosis, where it describes the allocation policy in the Engineering organization’s higher altitude strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy. Direction provides explicit instruction on how a decision must be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment. Direction works well when you need an unambiguous policy that doesn’t leave room for interpretation. For example, Calm’s policy for working in the monolith: We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith. In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below. In that case, the team couldn’t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation. Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in strategy on working with new Private Equity ownership: We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit. It’s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful. Guidance provides a recommendation about how a decision should be made. Guidance is useful when there’s enough nuance, ambiguity, or complexity that you can explain the desired destination, but you can’t mandate the path to reaching it. One example of guidance comes from the Index acquisition integration strategy: Minimize changes to tokenization environment: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored. However, any other functionality must not be added to our tokenization environment. This might read like direction, but it’s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it’s not able to articulate what complexity is necessary, so ultimately it’s guidance because it requires significant judgment to interpret. A second example of guidance comes in the strategy on decomposing a monolithic codebase: Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context. This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it’s possible to merge a given service back into a monolithic codebase, they’ll probably disagree. That’s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion. When you’re working on a strategy’s policy section, it’s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they’re all usable: If you’re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform. If you’re an executive, you might lean heavily on direction. Indeed, you might lean too heavily on direction, where guidance often works better for areas where you understand the direction but not the path. If you’re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics. Finally, if you have a clear approach you want to take that doesn’t fit cleanly into any of these categories, then don’t let this framework dissuade you. Give it a try, and adapt if it doesn’t initially work out. Maintaining strategy altitude The chapter on when to write engineering strategy introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization. Without repeating that section in its entirety, it’s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies: Stripe’s Sorbet strategy only worked in an organization that enforced the use of a single programming language across (essentially) all teams Uber’s service migration strategy worked well in an organization that was unwilling to enforce consistent programming language adoption across teams Stripe’s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams’ leverage. Both altitudes make sense. Both have consequences. Criteria for effective policies In The Engineering Executive’s Primer’s chapter on engineering strategy, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit: Applicable: it can be used to navigate complex, real scenarios, particularly when making tradeoffs. Enforced: teams will be held accountable for following the guiding policy. Create Leverage: create compounding or multiplicative impact. The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn’t make as much sense here. Some policies certainly should create leverage (e.g. empower developer experience team by restricting new services), but others might not (e.g. moving to an N-1 backfill policy). Outside the executive context, what’s important isn’t necessarily creating leverage, but that a policy solves for part of the diagnosis. That leaves the other two–being applicable and enforced–both of which are necessary for a policy to actually address the diagnosis. Any policy which you can’t determine how to apply, or aren’t willing to enforce, simply won’t be useful. Let’s apply these criteria to a handful of potential policies. First let’s think about policies we might write to improve the talent density of our engineering team: “We only hire world-class engineers.” This isn’t applicable, because it’s unclear what a world-class engineer means. Because there’s no mutually agreeable definition in this policy, it’s also not consistently enforceable. “We only hire engineers that get at least one ‘strong yes’ in scorecards.” This is applicable, because there’s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don’t happen to get a strong yes. Next, let’s think about a policy regarding code reuse within a codebase: “We follow a strict Don’t Repeat Yourself policy in our codebase.” There’s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there’s room for debate, it’s a very context specific determination to decide how to enforce a decision. “Code authors are responsible for determining if their contributions violate Don’t Repeat Yourself, and rewriting them if they do.” This is much more applicable, because now there’s only a single person’s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there’s no longer any ambiguity around who is deciding whether a piece of code is a repetition. The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that’s unwilling to distinguish between good and bad judgment won’t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another. If you ever find yourself wanting to include a policy that for some reason either can’t be applied or can’t be enforced, stop to ask yourself what you’re trying to accomplish and ponder if there’s a different policy that might be better suited to that goal. Developing novel policies My experience is that there are vanishingly few truly novel policies to write. There’s almost always someone else has already done something similar to your intended approach. Calm’s engineering strategy is such a case: the details are particular to the company, but the general approach is common across the industry. The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in the strategy for adopting large-language models, the new technology can be engaged with as a generic technology: Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets. Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers. You could simply replace “LLM” with “data-driven” and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don’t know much about. However, even if your policy isn’t novel to the industry, it might still be novel to you or your organization. The steps that I’ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement: Collect a number of similar policies, with a focus on how those policies differ from the policy you are creating Create a systems model to articulate how this policy will work, and also how it will differ from the similar policies you’re considering Run a strategy testing cycle for your proto-policy to discover any unknown-unknowns about how it works in practice Whether you run into this scenario is largely a function of the extent of your, and your organization’s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances. Are competing policy proposals an anti-pattern? When creating policy, you’ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I’d encourage you to think of this as exposing gaps in your diagnosis. For example, when Stripe developed the Sorbet ruby-typing tooling, there was debate between two policies: Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase? Should we migrate the codebase to a preexisting strongly typed language like Golang or Java? These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall. If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than including them in the core of your strategy document. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy. Recognizing constraints A similar problem to competing solutions is developing a policy that you cannot possibly fund. It’s easy to get enamored with policies that you can’t meaningfully enforce, but that’s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it. To consider a few examples: The strategy for controlling access to user data might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere. Our approach to Uber’s service migration might have required more staffing for the infrastructure engineering team, but we knew that wasn’t going to happen, so it was a meaningless policy proposal to make. The strategy for navigating private equity ownership might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period. If you find a policy that contemplates an impractical approach, it doesn’t only indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward. In cases where aligning on the diagnosis isn’t possible, for example because you simply don’t understand the possibilities of a new technology as encountered in the strategy for adopting LLMs, then you’ve typically found a valuable opportunity to use strategy refinement to build alignment. Dealing with missing strategies At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing. They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the diagnosis chapter about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward. Sometimes, even when you do this, it’s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you’re an executive or an engineer, you’ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable. Summary After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there’s only one phase of strategy left to dig into, operating the policies you’ve created.

22 hours ago 3 votes
Fast and random sampling in SQLite

I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]

12 hours ago 1 votes
Choosing Languages
yesterday 3 votes
05 · Syncing Keyhive

How we sync Keyhive and Automerge

yesterday 1 votes