More from On Test Automation
Last weekend, I wrote a more or less casual post on LinkedIn containing the ‘rules’ (it’s more of a list of terms and conditions, really) I set for myself when it comes to using AI. That post received some interesting comments that made me think and refine my thoughts on when (not) to use AI to support me in my work. Thank you to all of you who commented for doing so, and for showing me that there still is value in being active on LinkedIn in between all the AI-generated ‘content’. I really appreciate it. Now, AI and LLMs like ChatGPT or Claude can be very useful, that is, when used prudently. I think it is very important to be conscious and cautious when it comes to using AI, though, which is why I wrote that post. I wrote it mostly for myself, to structure my thoughts around AI, but also because I think it is important that others are at least conscious of what they’re doing and working with. That doesn’t mean you have to adhere to or even agree with my views and the way I use these tools, by the way. Different strokes for different folks. Because of the ephemeral nature of these LinkedIn posts, and the importance of the topic to me, I want to repeat the ‘rules’ (again, more of a T&C list) I wrote down here. This is the original, unchanged list from the post I wrote on February 15: I only use it to support me in completing tasks I understand. I need to be able to scrutinize the output the AI system produces and see if it is both sound and fit for the purpose I want to use it for. I never use it to explain to me something I don’t know yet or don’t understand enough. I have seen and read about too many hallucinations to trust them to teach me what I don’t understand. Instead, I use books, articles, and other content from authors and sources I do trust if I’m looking to learn something new. I never EVER use it for creative work. I don’t use AI-generated images anywhere, and all of my blogs, LinkedIn posts, comments, course material and other written text are 100% my own, warts and all. My views, my ideas, my voice. Interestingly, most of the comments were written in reaction to the first two bullet points at the time I wrote this blog post. I don’t know exactly why this is the case, it might be because the people who read it agree (which I doubt seeing the tsunami of AI-generated content that’s around these days), or maybe because there’s a bit of stigma around admitting to use AI for content generation. I don’t know. What I do know is that it is an important principle to me. I wrote about the reasons for that in an earlier blog post, so I won’t repeat myself here. Like so many terms and conditions, the list I wrote down in this post will probably evolve over time, but what will not change is me remaining very careful around where I use and where I don’t use AI to help me in my work. Especially now that the speed with which new developments in the AI space are presented to us and the claims around what it can and will do only get bigger, I think it is wise to remain cautious and look at these developments with a critical and very much human view.
When I build and release new features or bug fixes for RestAssured.Net, I rely heavily on the acceptance tests that I wrote over time. Next to serving as living documentation for the library, I run these tests both locally and on every push to GitHub to see if I didn’t accidentally break something, for different versions of .NET. But how reliable are these tests really? Can I trust them to pass and fail when they should? Did I cover all the things that are important? I speak, write and teach about the importance of testing your tests on a regular basis, so it makes sense to start walking the talk and get more insight into the quality of the RestAssured.Net test suite. One approach to learning more about the quality of your tests is through a technique called mutation testing. I speak about and demo testing your tests and using mutation testing to do so on a regular basis (you can watch a recent talk here), but until now, I’ve pretty much exclusively used PITest for Java. As RestAssured.Net is a C# library, I can’t use PITest, but I’d heard many good things about Stryker.NET, so this would be a perfect opportunity to finally use it. Adding Stryker.NET to the RestAssured.Net project The first step was to add Stryker.Net to the RestAssured.Net project. Stryker.NET is a dotnet tool, so installing it is straightforward: run dotnet new tool-manifest to create a new, project-specific tool manifest (this was the first local dotnet tool for this project) and then dotnet tool install dotnet-stryker to add Stryker.NET as a dotnet tool to the project. Running mutation tests for the first time Running mutation tests with Stryker.NET is just as straightforward: dotnet stryker --project RestAssured.Net.csproj from the tests project folder is all it takes. Because both my test suite (about 200 tests) and the project itself are relatively small code bases, and because my test suite runs quickly, running mutation tests for my entire project works for me. It still took around five minutes for the process to complete. If you have a larger code base, and longer-running test suites, you’ll see that mutation testing will take much, much longer. In that case, it’s probably best to start on a subset of your code base and a subset of your test suite. After five minutes and change, the results are in: Stryker.NET created 538 mutants from my application code base. Of these: 390 were killed, that is, at least one test failed because of this mutation, 117 survived, that is, the change did not make any of the tests fail, and 31 resulted in a timeout, which I’ll need to investigate further, but I suspect it has something to do with HTTP timeouts (RestAssured.Net is an HTTP API testing library, and all acceptance tests perform actual HTTP requests) This leads to an overall mutation testing score of 59.97%. Is that good? Is that bad? In all honesty, I don’t know, and I don’t care. Just like with code coverage, I am not a fan of setting fixed targets for this type of metric, as these will typically lead to writing tests for the sake of improving a score rather than for actual improvement of the code. What I am much more interested in is the information that Stryker.NET produced during the mutation testing process. Opening the HTML report I was surprised to see that out of the box, Stryker.NET produces a very good-looking and incredibly helpful HTML report. It provides both a high-level overview of the results: as well as in-depth detail for every mutant that was killed or that survived. It offers a breakdown of the results per namespace and per class, and it is the starting point for further drilling down into results for individual mutants. Let’s have a look and see if the report provides some useful, actionable information for us to improve the RestAssured.Net test suite. Missing coverage Like many other mutation testing tools, Stryker.NET provides code coverage information along with mutation coverage information. That is, if there is code in the application code base that was mutated, but that is not covered by any of the tests, Stryker.NET will inform you about it. Here’s an example: Stryker.NET changed the message of an exception thrown when RestAssured.Net is asked to deserialize a response body that is either null or empty. Apparently, there is no test in the test suite that covers this path in the code. As this particular code path deals with exception handling, it’s probably a good idea to add a test for it: [Test] public void EmptyResponseBodyThrowsTheExpectedException() { var de = Assert.Throws<DeserializationException>(() => { Location responseLocation = (Location)Given() .When() .Get($"{MOCK_SERVER_BASE_URL}/empty-response-body") .DeserializeTo(typeof(Location)); }); Assert.That(de?.Message, Is.EqualTo("Response content is null or empty.")); } I added the corresponding test in this commit. Removed code blocks Another type of mutant that Stryker.NET generates is the removal of a code block. Going by the mutation testing report, it seems like there are a few of these mutants that are not detected by any of the tests. Here’s an example: The return statement for the Put() method body, which is used to perform an HTTP PUT operation, is replaced with an empty method body, but this is not picked up by any of the tests. The same applies to the methods for HTTP PATCH, DELETE, HEAD and OPTIONS. Looking at the tests that cover the different HTTP verbs, this makes sense. While I do call each of these HTTP methods in a test, I don’t assert on the result for the aforementioned HTTP verbs. I am basically relying on the fact that no exception is thrown when I call Put() when I say ‘it works’. Let’s change that by at least asserting on a property of the response that is returned when these HTTP verbs are used: [Test] public void HttpPutCanBeUsed() { Given() .When() .Put($"{MOCK_SERVER_BASE_URL}/http-put") .Then() .StatusCode(200); } These assertions were added to the RestAssured.Net test suite in this commit. Improving testability The next signal I received from this initial mutation testing run is an interesting one. It tells me that even though I have acceptance tests that add cookies to the request and that only pass when the request contains the cookies I set, I’m not properly covering some logic that I added: To understand what is going on here, it is useful to know that a Cookie in C# offers a constructor that creates a Cookie specifying only a name and a value, but that a cookie has to have a domain value set. To enforce that, I added the logic you see in the screenshot. However, Stryker.NET tells me I’m not properly testing this logic, because changing its implementation doesn’t cause any tests to fail. Now, I might be able to test this specific logic with a few added acceptance tests, but it really is only a small piece of logic, and I should be able to test that logic in isolation, right? Well, not with the code written in the way it currently is… So, time to extract that piece of logic into a class of its own, which will improve both the modularity of the code and allow me to test it in isolation. First, let’s extract the logic into a CookieUtils class: internal class CookieUtils { internal Cookie SetDomainFor(Cookie cookie, string hostname) { if (string.IsNullOrEmpty(cookie.Domain)) { cookie.Domain = hostname; } return cookie; } } I deliberately made this class internal as I don’t want it to be directly accessible to RestAssured.Net users. However, as I do need to access it in the tests, I have to add this little snippet to the RestAssured.Net.csproj file: <ItemGroup> <InternalsVisibleTo Include="$(MSBuildProjectName).Tests" /> </ItemGroup> Now, I can add unit tests that should cover both paths in the SetDomainFor() logic: [Test] public void CookieDomainIsSetToDefaultValueWhenNotSpecified() { Cookie cookie = new Cookie("cookie_name", "cookie_value"); CookieUtils cookieUtils = new CookieUtils(); cookie = cookieUtils.SetDomainFor(cookie, "localhost"); Assert.That(cookie.Domain, Is.EqualTo("localhost")); } [Test] public void CookieDomainIsUnchangedWhenSpecifiedAlready() { Cookie cookie = new Cookie("cookie_name", "cookie_value", "/my_path", "strawberry.com"); CookieUtils cookieUtils = new CookieUtils(); cookie = cookieUtils.SetDomainFor(cookie, "localhost"); Assert.That(cookie.Domain, Is.EqualTo("strawberry.com")); } These changes were added to the RestAssured.Net source and test code in this commit. An interesting mutation So far, all the signals that appeared in the mutation testing report generated by Stryker.NET have been valuable, as in: they have pointed me at code that isn’t covered by any tests yet, to tests that could be improved, and they have led to code refactoring to improve testability. Using Stryker.NET (and mutation testing in general) does sometimes lead to some, well, interesting mutations, like this one: I’m checking that a certain string is either null or an empty string, and if either condition is true, RestAssured.Net throws an exception. Perfectly valid. However, Stryker.NET changes the logical OR to a logical AND (a common mutation), which makes it impossible for the condition to evaluate to true. Is that even a useful mutation to make? Well, to some extent, it is. Even if the code doesn’t make sense anymore after it has been mutated, it does tell you that your tests for this logical condition probably need some improvement. In this case, I don’t have to add more tests, as we discussed this exact statement earlier (remember that it had no test coverage at all). It did make me look at this statement once again, though, and I only then realized that I could simplify this code snippet to if (string.IsNullOrEmpty(responseBodyAsString)) { throw new DeserializationException("Response content is null or empty."); } Instead of a custom-built logical OR, I am now using a construct built into C#, which is arguably the safer choice. In general, if your mutation testing tool generates several (or even many) mutants for the same code statement or block, it might be a good idea to have another look at that code and see if it can be simplified. This was just a very small example, but I think this observation holds true in general. This change was added to the RestAssured.Net source and test code in this commit. Running mutation tests again and inspecting the results Now that several (supposed) improvements to the tests and the code have been made, let’s run the mutation tests another time to see if the changes improved our score. In short: 397 mutants were killed now, up from 390 (that’s good) 111 mutants survived, down from 117 (that’s also good) there were 32 timeouts, up from 31 (that needs some further investigation) Overall, the mutation testing score went up from 59,97% to 61,11%. This might not seem like much, but it is definitely a step in the right direction. The most important thing for me right now is that my tests for RestAssured.Net have improved, my code has improved and I learned a lot about mutation testing and Stryker.NET in the process. Am I going to run mutation tests every time I make a change? Probably not. There is quite a lot of information to go through, and that takes time, time that I don’t want to spend for every build. For that reason, I’m also not going to make these mutation tests part of the build and test pipeline for RestAssured.Net, at least not any time soon. This was nonetheless both a very valuable and a very enjoyable exercise, and I’ll definitely keep improving the tests and the code for RestAssured.Net using the suggestions that Stryker.NET presents.
As is the case every year, 2025 is starting off relatively slowly. There’s not a lot of training courses to run yet, and since a few of the projects I worked on wrapped up in December, I find myself with a little bit of extra time and headspace on my hands. I actually enjoy these slower moments, because they give me some time to think about where my professional career is going, if I’m still happy with the direction it is going on, and what I would like to see changed. Last year, I quit doing full time projects as an individual contributor to development teams in favour of part-time consultancy work and more focus on my training services. 2024 has been a great year overall, and I would be happy to continue working in this way in 2025. However, as a thought experiment, I took some time to think about what it would take for me to go back to full time roles, or maybe (maybe!) even consider joining a company on a permanent basis. Please note that this post is not intended as an ‘I need a job!’ cry for help. My pipeline for 2025 is slowly but surely filling up, and again, I am very happy with the direction my career is going at the moment. However, I have learned that it never hurts to leave your options open, and even though I love the variety in my working days these days, I think I would enjoy working with one team, on one goal, for an extended amount of time, too, under the right conditions. If nothing else, this post might serve as a reference post to send to people and companies that reach out to me with a full time contract opportunity or even a permanent job opening. This is also not a list of requirements that is set in stone. As my views on what would make a great job change (and they will), I will update this post to reflect those views. So, to even consider joining a company on a full-time contract or even a permanent basis, there are basically three things I will and should consider: What does the job look like? What will I be doing on a day-to-day basis? What are the must-haves regarding terms and conditions? What are the nice to haves that would provide the icing on the cake for me? Let’s take a closer look at each of these things. What I look for in a job As I mentioned before, I am not looking for a job as an individual contributor to a development team. I have done that for many years, and it does not really give me the energy that it used to. On the other hand, I am definitely not looking for a hands-off, managerial kind of role, as I’d like to think I would make an atrocious manager. Plus, I simply enjoy being hands-on and writing code way too much to let that go. I would like to be responsible for designing and implementing the testing and automation strategy for a product I believe in. It would be a lead role, but, as mentioned, with plenty (as in daily) opportunities to get hands-on and contribute to the code. The work would have to be technically and mentally challenging enough to keep me motivated in the long term. Getting bored quickly is something I suffer from, which is the main driver behind only doing part-time projects and working on multiple different things in parallel right now. I don’t want to work for a consultancy and be ‘farmed out’ to their clients. I’ve done that pretty much my entire career, and if that’s what the job will look like, I’d rather keep working the way I’m working now. The must-haves There are (quite) a few things that are non-negotiable for me to even consider joining a company full time, no matter if it’s on a contract or a permanent basis. The pay must be excellent. Let’s not beat around the bush here: people work to make money. I do, too. I’m doing very well right now, and I don’t want that to change. The company should be output-focused, as in they don’t care when I work, how many hours I put in and where I work from, as long as the job gets done. I am sort of spoiled by my current way of working, I fully realise that, but I’ve grown to love the flexibility. By the way, please don’t read ‘flexible’ as ‘working willy-nilly’. Most work is not done in a vacuum, and you will have to coordinate with others. The key word here is ‘balance’. Collaboration should be part of the company culture. I enjoy working in pair programming and pair testing setups. What I do not like are pointless meetings, and that includes having Scrum ceremonies ‘just because’. The company should be a remote-first company. I don’t mind the occasional office day, but I value my time too much to spend hours per week on commuting. I’ve done that for years, and it is time I’ll never get back. The company should actively stimulate me contributing to conferences and meetups. Public speaking is an important part of my career at the moment, and I get a lot of value from it. I don’t want to give that up. There should be plenty of opportunities for teaching others. This is what I do for a living right now, I really enjoy it, and I’d like to think I’m pretty good at it, too. Just like with the public speaking, I don’t want to give that up. This teaching can take many forms, though. Running workshops and regular pairing with others are just two examples. The job should scratch my travel itch. I travel abroad for work on average about 5-6 times per year these days, and I would like to keep doing that, as I get a lot of energy from seeing different places and meeting people. Please note that ‘traveling’ and ‘commuting’ are two completely different things. Yes, I realize this is quite a long list, but I really enjoy my career at the moment, and there are a lot of aspects to it that I’m not ready to give up. The nice to haves There are also some things that are not strictly necessary, but would be very nice to have in a job or full time contract: The opportunity to continue working on side gigs. I have a few returning customers that I’ve been working with for years, and I would really appreciate the opportunity to continue doing that. I realise that I would have to give up some things, but there are a few clients that I would really like to keep working with. By the way, this is only a nice to have for permanent jobs. For contracting gigs, it is a must-have. It would be very nice if the technology stack that the company is using is based on C#. I’ve been doing quite a bit of work in this stack over the years and I would like to go even deeper. If the travel itch I mentioned under the must-haves could be scratched with regular travel to Canada, Norway or South Africa, three of my favourite destinations in the world, that would be a very big plus. I realize that the list of requirements above is a long one. I don’t think there is a single job out there that ticks all the boxes. But, again, I really like what I’m doing at the moment, and most of the boxes are ticked at the moment. I would absolutely consider going full time with a client or even an employer, but I want it to be a step forward, not a step back. After all, this is mostly a thought experiment at the moment, and until that perfect contract or job comes along, I’ll happily continue what I’m doing right now.
As a (sort of) follow-up post to my yearly review for 2024, in this post, I would like to go over the changes, bug fixes and new features that have been introduced in RestAssured .NET in 2024. This year, I released 7 new versions of the library, and while none of the versions included changes that were worthy of a blog post on its own, I thought it would be a good idea to wrap them all up in a single overview. Basically, this blog post is an extended version of the library’s CHANGELOG. I’ll go through the new versions chronologically, starting with the first release of 2024. Version 4.2.2 - released April 23 RestAssured .NET 4.2.2 fixes a bug that prevented JSON responses that are an array to be properly verified. In other words, if the JSON response body looks like this: [ { "id": 1, "text": "Do the dishes" }, { "id": 2, "text": "Clean out the trash" }, { "id": 3, "text": "Read the newspaper" } ] I would expect this test to pass: [Test] public void JsonArrayResponseBodyElementCanBeVerifiedUsingNHamcrestMatcher() { Given() .When() .Get("http://localhost:9876/json-array-response-body") .Then() .StatusCode(200) .Body("$[1].text", NHamcrest.Is.EqualTo("Clean out the trash")); } but prior to this version, it threw a Newtonsoft.Json.JsonReaderException. The solution? Adding a try-catch that first tries to parse the JSON response as a JObject (equal to existing behaviour), catch the JsonReaderException and try again, now parsing the JSON response into a JArray. That made the newly added test pass without failing any other tests. Another demonstration of the added value of having a decent set of tests. RestAssured .NET is slowly growing and becoming more complex, and having a test suite I can run locally, and that always runs when I push code to GitHub is an invaluable safety net for me. These tests run in a few seconds, yet they give me invaluable feedback on the effect of new features, bug fixes and code refactoring efforts. I haven’t heard back from the person submitting the original issue, but I assume that this fixed their issue. Version 4.3.0 - released August 16 I love learning about how people use RestAssured .NET, because invariably they will use it in ways I haven’t foreseen. I was unfamiliar with the concept of server-sent events (SSE) in APIs, for example, yet there are people looking to test these kinds of APIs using RestAssured .NET. It turned out that what this user was looking for was a way to set the HttpCompletionOption value on the System.Net.Http.HttpClient that is wrapped by RestAssured .NET. To enable this, I added a method to the DSL that looks like this: Given() .UseHttpCompletionOption(HttpCompletionOption.ResponseHeadersRead) I also added the option to specify the HttpCompletionOption to be used in a RequestSpecification as well as in the global config. A straightforward fix that solved the problem for this specific user. The only thing I don’t like here is that I don’t know of a way to test this locally. Do you? I would love to hear it. Version 4.3.1 - release August 22 Another user pointed out to me that trying to verify that the value of a JSON response body element is an empty array also threw an exception. So, if the JSON response body looks like this: { "success": true, "errors": [] } this test should pass, but instead it threw a Newtonsoft.Json.JsonSerializationException: [Test] public void JsonResponseBodyElementEmptyArrayValueCanBeVerifiedUsingNHamcrestMatcher() { Given() .When() .Get("http://localhost:9876/json-empty-array-response-body") .Then() .StatusCode(200) .Body("$.errors", NHamcrest.Is.OfLength(0)); } The fix? Adding some code that checks if the element returned when evaluating the JsonPath expression is a JArray or a JObject and using the right matching logic accordingly. I used my preferred procedure here: first, write a failing test that reproduces the issue then, make the test pass without breaking any other tests refactor the code, document and release Does this procedure sound familiar to you? Version 4.4.0 - released October 21 As you can probably tell from the semantic versioning, this version introduced a new feature to RestAssured .NET: the ability to use NTLM authentication when making an HTTP call. To enable this, I added a new method to the DSL: Given() .NtlmAuth() // This one uses default NTLM credentials for the current user .NtlmAuth("username", "password", "domain") // This one uses custom NTLM credentials As I had no idea how to write a proper test for this, even though I had tested it before releasing using Fiddler, I released a beta version first that the person submitting the issue could use to verify the solution. I’m happy to say that it worked for them and that the solution could be released properly. Again, if someone can think of a way to add a proper test for NTLM authentication to the test suite, I would love to hear it. All that the current tests do is run the code and see if no exception is thrown. Not a good test, but until I find a better way, it will have to do. Version 4.5.0 - released November 19 This version introduced not one, but two changes. First, since .NET 9 was officially released earlier that week (or maybe the week before, I forgot), I needed to release a RestAssured .NET version that targets .NET 9, so I did. Just like with .NET 8, I didn’t really have to change anything to the code other than adding net9.0 to the TargetFrameworks and add .NET 9 to the build pipeline for the library to make sure that every change is tested on .NET 9, too. Happy to say it all ‘just worked’. The other change took more effort: a user reported that they could not override the ResponseLogLevel set in a RequestSpecification at the individual test level. The reason? In the existing code, the response was logged directly after the HTTP call completed, so before any calls to Log() for the response. When Log() is called on the response, it was then logged again. I have no idea how I completely overlooked this until now, but I did. Rewriting the code to make this work took longer than I expected, but I managed in the end, through quite a bit of trial and error and lots of humand-centered testing (again, no idea how to write automated tests for this). The logging functionality of RestAssured .NET is something I intend to rewrite in the future, for a couple of reasons: It’s impossible to write automated tests for it (or at least I don’t know how to do this) Ideally, I want the logging to be more configurable and extensible to give users more flexibility than they have at the moment Version 4.5.1 - released November 20 As one does, I found an issue with the updated logging logic almost immediately after releasing 4.5.0 to the public: masking of sensitive headers and cookies didn’t work anymore when specified as part of a RequestSpecification. Lucky for me, this was a quick fix, but a bit embarrassing nonetheless. Had I had proper automated tests for the logging in place, I probably would have caught this before releasing 4.5.0…. Anyway, it’s fixed now, as far as I can tell. Version 4.6.0 - released December 9 The final RestAssured .NET release of 2024 added the capability to strip the ; charset=<some_charset> from the Content-Type header in a request. It turns out, some APIs explicitly expect this header to not contain the charset suffix, but the way I create a request, or rather, the way .NET creates a StringContent object, will add it by default. This issue was a great example of one of the main reasons why I started this project: there is so much I don’t know yet about HTTP, APIs, C#/.NET and other technologies, and working on these issues and improving RestAssured .NET gives me an opportunity to learn them. I make a habit of writing what I learned down in the issue on GitHub, so I can review it later, and so I can point others to these links and thoughts, too. So, if you’re looking for a way to strip the charset identifier from the Content-Type header in the request, you can now do that by passing an optional second boolean argument to Body() (defaults to false): Given() .Body(your_body_goes_here, stripCharset: true) That’s it! As you can see, lots of small changes, bug fixes and new features have been added to RestAssured .NET this year. Oh, and before I forget: with every release, I also made sure to update the dependencies I use to create and test RestAssured .NET to their latest versions. I consider that good housekeeping, and it’s all part of keeping a library up to date. I am looking forward to seeing the library evolve and improve further in 2025.
More in programming
Revealed: How the UK tech secretary uses ChatGPT for policy advice by Chris Stokel-Walker for the New Scientist
This book’s introduction started by defining strategy as “making decisions.” Then we dug into exploration, diagnosis, and refinement: three chapters where you could argue that we didn’t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the following chapter on operations, we finally start to actually make some decisions. In this chapter, we’ll dig into: How we define policy, and how setting policy differs from operating policy as discussed in the next chapter The structured steps for setting policy How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way? Recurring kinds of policies that appear frequently in strategies Why it’s valuable to be intentional about your strategy’s altitude, and how engineers and executives generally maintain different altitudes in their strategies Criteria to use for evaluating whether your policies are likely to be impactful How to develop novel policies, and why it’s rare Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis How policies that ignore constraints sound inspirational, but accomplish little Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders By the end, you’ll be ready to evaluate why an existing strategy’s policies are struggling to make an impact, and to start iterating on policies for strategy of your own. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. What is policy? Policy is interpreting your diagnosis into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They’ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization. An effective policy solves the entirety of the strategy’s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the strategy for working with private equity ownership acknowledges in its diagnosis that they don’t have clear guidance on what kind of reduction to expect: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available: We believe our new ownership will provide a specific target for Research and Development (R&D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes. There are two frequent points of confusion when creating policies that are worth addressing directly: Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy’s diagnosis. For example, the “N-1 backfill policy” makes sense in the context of new, private equity ownership. The policy wouldn’t work well in a rapidly expanding organization. Any strategy without a policy is useless, but you’ll also find policies without context aren’t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections. Policy describes how tradeoffs should be made, but it doesn’t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization’s behavior to ensure policies are followed. When reworking a strategy to be more readable, it often makes sense to merge policy and operation sections together. However, when drafting strategy it’s valuable to keep them separate. Yes, you might use a weekly meeting to review whether the policy is being followed, but whether it’s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement. With this definition in mind, now we can move onto the more interesting discussion of how to set policy. How to set policy Every part of writing a strategy feels hard when you’re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It’s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow. The steps I follow to write a strategy’s policy are: Review diagnosis to ensure it captures the most important themes. It doesn’t need to be perfect, but it shouldn’t have omissions so obvious that you can immediately identify them. Select policies that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered. This is a broad instruction, but it’s simpler than it sounds because you’ll typically select from policies identified during your exploration phase. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances. If you do find yourself developing a novel policy, there’s a later section in this chapter, Developing novel policies, that addresses that topic in more detail. Consolidate policies in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization. Backtest policy against recent decisions you’ve made. This is particularly effective if you maintain a decision log in your organization. Mine for conflict once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don’t wholly eliminate those that you agree with. Just as it’s easy to crowd out opposing views in diagnosis if you don’t solicit their input, it’s possible to accidentally crowd out your own perspective if you anchor too much on others’ perspectives. Consider refinement if you finish writing, and you just aren’t sure your approach works – that’s fine! Return to the refinement phase by deploying one of the refinement techniques to increase your conviction. Remember that we talk about strategy like it’s done in one pass, but almost all real strategy takes many refinement passes. The steps of writing policy are relatively pedestrian, largely because you’ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you’d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower. How many policies? Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The strategy for decomposing a monolithic application is not one policy deciding not to decompose, but a series of four policies: Business units should always operate in their own code repository and monolith. New integrations across business unit monoliths should be done using gRPC. Except for new business unit monoliths, we don’t allow new services. Merge existing services into business-unit monoliths where you can. Four isn’t universally the right number either. It’s simply the number that was required to solve that strategy’s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That’s great: what makes a policy good is that it’s effective, not that it’s novel or inspiring. Kinds of policies While there are so many policies you can write, I’ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories. Approvals define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive. In the Index post-acquisition integration strategy, there were a number of complex decisions to be made, and the approval mechanism was: Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering. This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the user data access strategy’s approval strategy was more focused on managing corporate risk: Exceptions must be granted in writing by CISO. While our overarching Engineering Strategy states that we follow an advisory architecture process as described in Facilitating Software Architecture, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel. These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust. Allocations describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization’s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company’s beliefs. The strategy on Uber’s service migration has two concrete examples of allocation policies. The first describes the Infrastructure engineering team’s allocation between manual provision tasks and investing into creating a self-service provisioning platform: Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers. The second allocation policy is implicitly noted in this strategy’s diagnosis, where it describes the allocation policy in the Engineering organization’s higher altitude strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy. Direction provides explicit instruction on how a decision must be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment. Direction works well when you need an unambiguous policy that doesn’t leave room for interpretation. For example, Calm’s policy for working in the monolith: We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith. In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below. In that case, the team couldn’t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation. Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in strategy on working with new Private Equity ownership: We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit. It’s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful. Guidance provides a recommendation about how a decision should be made. Guidance is useful when there’s enough nuance, ambiguity, or complexity that you can explain the desired destination, but you can’t mandate the path to reaching it. One example of guidance comes from the Index acquisition integration strategy: Minimize changes to tokenization environment: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored. However, any other functionality must not be added to our tokenization environment. This might read like direction, but it’s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it’s not able to articulate what complexity is necessary, so ultimately it’s guidance because it requires significant judgment to interpret. A second example of guidance comes in the strategy on decomposing a monolithic codebase: Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context. This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it’s possible to merge a given service back into a monolithic codebase, they’ll probably disagree. That’s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion. When you’re working on a strategy’s policy section, it’s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they’re all usable: If you’re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform. If you’re an executive, you might lean heavily on direction. Indeed, you might lean too heavily on direction, where guidance often works better for areas where you understand the direction but not the path. If you’re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics. Finally, if you have a clear approach you want to take that doesn’t fit cleanly into any of these categories, then don’t let this framework dissuade you. Give it a try, and adapt if it doesn’t initially work out. Maintaining strategy altitude The chapter on when to write engineering strategy introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization. Without repeating that section in its entirety, it’s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies: Stripe’s Sorbet strategy only worked in an organization that enforced the use of a single programming language across (essentially) all teams Uber’s service migration strategy worked well in an organization that was unwilling to enforce consistent programming language adoption across teams Stripe’s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams’ leverage. Both altitudes make sense. Both have consequences. Criteria for effective policies In The Engineering Executive’s Primer’s chapter on engineering strategy, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit: Applicable: it can be used to navigate complex, real scenarios, particularly when making tradeoffs. Enforced: teams will be held accountable for following the guiding policy. Create Leverage: create compounding or multiplicative impact. The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn’t make as much sense here. Some policies certainly should create leverage (e.g. empower developer experience team by restricting new services), but others might not (e.g. moving to an N-1 backfill policy). Outside the executive context, what’s important isn’t necessarily creating leverage, but that a policy solves for part of the diagnosis. That leaves the other two–being applicable and enforced–both of which are necessary for a policy to actually address the diagnosis. Any policy which you can’t determine how to apply, or aren’t willing to enforce, simply won’t be useful. Let’s apply these criteria to a handful of potential policies. First let’s think about policies we might write to improve the talent density of our engineering team: “We only hire world-class engineers.” This isn’t applicable, because it’s unclear what a world-class engineer means. Because there’s no mutually agreeable definition in this policy, it’s also not consistently enforceable. “We only hire engineers that get at least one ‘strong yes’ in scorecards.” This is applicable, because there’s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don’t happen to get a strong yes. Next, let’s think about a policy regarding code reuse within a codebase: “We follow a strict Don’t Repeat Yourself policy in our codebase.” There’s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there’s room for debate, it’s a very context specific determination to decide how to enforce a decision. “Code authors are responsible for determining if their contributions violate Don’t Repeat Yourself, and rewriting them if they do.” This is much more applicable, because now there’s only a single person’s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there’s no longer any ambiguity around who is deciding whether a piece of code is a repetition. The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that’s unwilling to distinguish between good and bad judgment won’t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another. If you ever find yourself wanting to include a policy that for some reason either can’t be applied or can’t be enforced, stop to ask yourself what you’re trying to accomplish and ponder if there’s a different policy that might be better suited to that goal. Developing novel policies My experience is that there are vanishingly few truly novel policies to write. There’s almost always someone else has already done something similar to your intended approach. Calm’s engineering strategy is such a case: the details are particular to the company, but the general approach is common across the industry. The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in the strategy for adopting large-language models, the new technology can be engaged with as a generic technology: Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets. Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers. You could simply replace “LLM” with “data-driven” and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don’t know much about. However, even if your policy isn’t novel to the industry, it might still be novel to you or your organization. The steps that I’ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement: Collect a number of similar policies, with a focus on how those policies differ from the policy you are creating Create a systems model to articulate how this policy will work, and also how it will differ from the similar policies you’re considering Run a strategy testing cycle for your proto-policy to discover any unknown-unknowns about how it works in practice Whether you run into this scenario is largely a function of the extent of your, and your organization’s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances. Are competing policy proposals an anti-pattern? When creating policy, you’ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I’d encourage you to think of this as exposing gaps in your diagnosis. For example, when Stripe developed the Sorbet ruby-typing tooling, there was debate between two policies: Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase? Should we migrate the codebase to a preexisting strongly typed language like Golang or Java? These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall. If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than including them in the core of your strategy document. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy. Recognizing constraints A similar problem to competing solutions is developing a policy that you cannot possibly fund. It’s easy to get enamored with policies that you can’t meaningfully enforce, but that’s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it. To consider a few examples: The strategy for controlling access to user data might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere. Our approach to Uber’s service migration might have required more staffing for the infrastructure engineering team, but we knew that wasn’t going to happen, so it was a meaningless policy proposal to make. The strategy for navigating private equity ownership might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period. If you find a policy that contemplates an impractical approach, it doesn’t only indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward. In cases where aligning on the diagnosis isn’t possible, for example because you simply don’t understand the possibilities of a new technology as encountered in the strategy for adopting LLMs, then you’ve typically found a valuable opportunity to use strategy refinement to build alignment. Dealing with missing strategies At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing. They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the diagnosis chapter about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward. Sometimes, even when you do this, it’s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you’re an executive or an engineer, you’ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable. Summary After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there’s only one phase of strategy left to dig into, operating the policies you’ve created.
I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]