Speeding up code with vectorization

from Kevin Chen [alt+shift+b] in programming

I’ve been writing a lot of math code with latency requirements these days. When I talk to people about my problems, they usually suggest multithreading and general-purpose GPU computing. These both have downsides. Multithreading might not be the best option in systems that already have a lot of concurrent workloads. And GPU computing adds the overhead of a round-trip copy to GPU memory, which might end up increasing latency depending on the problem size. Vectorization can offer a way out. Modern CPUs have several copies of key components like arithmetic logic units (ALUs), allowing them to execute multiple operations in parallel — even on the same CPU.1 Developers can take advantage of this functionality through special vector instructions. For the rest of this article, I’ll use Clang 10 and Intel’s AVX2 extension in all examples. However, the same ideas apply to other compilers and architectures. Here are the rules I’ve developed for vectorization: Option 1: Make a library do it for...

over a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Kevin Chen

Real estate is one of the hardest open problems in scaled self driving

I’ve had a minor obsession with Waymo’s autonomous vehicle depots recently. Over the past few months, I’ve flown a drone as part of a stakeout to understand how they work. And I’ve taken a deep dive into an apparent Waymo outage to find the company charging its electric vehicles from temporary diesel generators. The reason for my obsession? I believe depot buildouts will be one of the last hard problems in scaled autonomous driving. Long after the hardware, software, and AI have been perfected, real estate acquisition will remain a limiting factor in large-scale AV deployment. Waymo’s main depot at 201 Toland Street, San Francisco. Will self driving follow software scaling laws? In 2021, Elon Musk claimed that Tesla FSD’s release will be “one of the biggest asset value increases in history.” The day FSD goes to wide release will be one of the biggest asset value increases in history — Elon Musk (@elonmusk) October 20, 2021 Musk is arguing that, once autonomous driving has been solved, it can be instantly rolled out at the push of a button. Nearly all of Tesla’s fleet could be put to productive use without humans behind the wheel. While Musk’s viewpoint is on the extreme end, it’s a sentiment shared by many who have worked on or invested in autonomous driving over the years. Once you have hardware capable of supporting safe driverless operation, it’s just a matter of developing the right software. Software can be replicated infinitely at zero marginal cost. Could autonomous driving therefore scale as quickly as software platforms like Uber or DoorDash? The answer is not so simple. Self-driving cars are still cars — cars that exist in the physical world and need to be parked, fueled, cleaned, and repaired. Uber and other multi-sided marketplace platforms have been able to grow exponentially because they distribute these responsibilities to the individual drivers — many Uber drivers park at their own homes — allowing the platform provider to focus on developing the software pieces. So far, AV companies like Waymo and Cruise have taken a different approach. They’ve preferred to centralize these operational tasks in large depots staffed with their own personnel. This is because AV technology is still maturing and cannot be easily productized in the short term. Additionally, Timothy B. Lee notes in Understanding AI that “having hardware, software, and support services all under one roof makes it easier for Waymo to experiment with different technologies and business models.” When the kinks are still being worked out, it is more straightforward to vertically integrate everything in a single organization. The many jobs to be done of a robotaxi depot Depots for human-driven fleets, such as rental cars or delivery vans, only require a parking area with minimal additional infrastructure. This enables a fairly straightforward trade-off between location and cost: the fleet operator seeks a location close to customer demand while minimizing rent. For example, a logistics company participating in Amazon’s Delivery Service Partner program can run its depot from any sufficiently cheap parking lot near the local Amazon warehouse. The same constraints affect depot selection for autonomous vehicles. However, the depot also needs to be more than just a convenient parking lot to store off-duty cars. Because AVs are often also EVs, the ideal site also has electric vehicle charging. Because AVs need to upload driving logs to the cloud, it should have a high-speed Internet connection too. Let’s explore these constraints in detail. Location Depots should be placed close to customer demand to minimize deadheading (non-revenue driving), which would raise costs while degrading the customer experience with longer pickup times. Ideal depots are therefore located in desirable residential or commercial areas, where there is more competition among potential tenants. Placing depots in high-demand neighborhoods instead of industrial areas can also increase the probability of local opposition. Waymo has already encountered opposition during a proposed expansion of their main depot, even though it is located in an industrial neighborhood with many similar facilities. Again from Timothy B. Lee: Waymo sought a permit to convert the warehouse next door into some office space and a parking lot for Waymo employees. San Francisco’s Board of Supervisors unanimously rejected Waymo’s application. The rejection was partly based on fears that Waymo would eventually use the space to launch a delivery service in the city (Waymo hasn’t announced any plans to do this so far). But it also reflected city leaders’ frustration with their general lack of power over Waymo. Now consider the recent incident in which driverless Waymo vehicles honked at each other while entering a depot near residential buildings in San Francisco, often well into the early hours of the morning. While residents and the company resolved the situation amicably, it will surely be raised in future discussions of new Waymo depots in residential areas should they come before the Board of Supervisors or Planning Commission. Electric vehicle charging AV developers have preferred to run their services with electric vehicles. Although AV and EV technologies are not inherently coupled, running a fully electric fleet adds an environmental angle to the AV sales pitch, allowing the companies to claim that AV rides reduce emissions by displacing gas-powered driving. An EV fleet also lowers vehicle maintenance costs. Waymo and Cruise each have locations with DC fast charging capability. This approach avoids relying on public chargers. Taking Waymo’s primary San Francisco depot as an example, the company installed 38 chargers of approximately 60 kW each, implying a total site power of around 2.4 MW. Waymo vehicles charging in San Francisco. Approximately one-third of parking spots in the main depot have charging. Bringing in so many high-power chargers likely added significant complexity to Waymo’s depot construction. While we don’t Waymo’s process, we have a fairly good benchmark from the Tesla community, which tracks Tesla Supercharger installations closely. From Bruce Mah, a seasoned EV charging observer, the construction process is: As with any construction project, things usually start with selecting a site and permitting. There will often be some demolition / excavation of part of a parking lot (Superchargers are often built in existing parking lots). Tesla equipment such as charging cabinets, posts, etc. will usually be installed next (see T1 below). Eventually there will be some inspections from the local Authority Having Jurisdiction (AHJ). A utility transformer (from PG&E, SCE, etc.) is usually the last piece of equipment to be installed. Repaving, painting, and installation of parking stops will also usually happen late in the process, as well as landscaping and lighting enhancements. Of these steps, permitting and utility work are not within the charging operator’s control. California municipalities, especially San Francisco, have a notoriously slow and political permitting process. With PG&E, the utility serving much of the state, electrical service upgrades involving a new distribution transformer can take months. Timelines aside, building out a charging site is also expensive. For example, an agreement between Tesla and the City of West Hollywood values an eight-plug location at $482,942 for both equipment and construction. Data offload The final piece of the puzzle is data offload. Autonomous vehicles log vast amounts of data as they drive, measured in hundreds of GBs to TBs per hour. Some of the data is subject to mandatory retention and must be uploaded for later review. At a minimum, all AV collisions in California must be reported to the DMV. Regulators at all levels of government expect the AV developer to present analyses of serious incidents, including recordings from the vehicle and explanations of the AV’s decisions. In addition to regulatory requirements, the AV developer often wants to return much more data for engineering purposes: near misses, stuck events, novel or interesting scenarios, and more. The upshot is that the AV operator needs to upload a substantial portion of the hundreds of GBs to TBs logged per hour of driving. Uploading over cellular networks would not be cost effective. These transfers must occur at a depot. Today, it’s likely that Waymo and Cruise use disk swapping for data offload. When a car fills up its internal logging disk, it notifies an operator to plug in a fresh one. The full disks may be uploaded directly from the depot or shipped to a datacenter. This whole process is labor intensive and, over time, may pose a reliability concern due to dust or water ingress. A Waymo operator performs a possible disk swap. Many AV developers are moving toward direct data transfer from the vehicle using Ethernet, Wi-Fi, or a private 5G network, which reduces the number of manual touch points and moving parts. Charging is a great time to perform these transfers. However, this imposes an additional requirement on the depot: a fast upload speed, probably a fiber connection of at least 10 Gbps. Where do we go from here? When we put all three requirements together (great location, high-power EV charging, and high-speed Internet), there may be few to none sites that fit the bill. This would require the AV operator to take on site-specific construction projects to add amenities like charging and Internet — a strategy that sits in direct opposition to rapid and cost-effective scaling. Another possibility is to engineer ways to relax the constraints. Decoupling the requirements Waymo and Cruise do not require all of their locations to have charging and data offload. For example, Waymo operates satellite lots in downtown San Francisco only for storing their off-hail vehicles. Every night, fleet management software instructs the cars to travel back to the main depot for charging and data offload. A Waymo satellite location in San Francisco with minimal staffing, no charging, and apparently no data offloading. This solution works as long as the total charging and data transfer capacity across all locations exceeds the average throughput required to keep the fleet in working order. However, the lack of redundancy can lead to cascading failures, such as the apparent power outage at Waymo’s main depot that led the company to shut down many vehicles during a Friday evening rush hour. Reducing charging power Waymo and Cruise currently use DC fast charging (DCFC) for their fleets. Level 2 (L2) or AC charging could reduce the cost of buildouts because the equipment is cheaper and can often be added without bringing a new utility transformer. This could enable overnight charging in satellite locations that do not currently have any charging capacity. Imagine an operator showing up to plug in all the cars at night when there is little demand, then returning in the morning to unplug them. There is an order of magnitude speed difference between L2 and DCFC. This is important for consumer charging, where the consumer cares about the time to get a single car back on the road. However, charging power for any individual car becomes less important when charging a large fleet. Fleet operators care about the total throughput of turning around cars, which is proportional to total power delivered across all chargers. In other words, assuming an autonomous ride hailing service will always have overnight lulls in demand and enough parking spots during those times, the most scalable strategy is to procure your desired total charging power at the lowest price. DCFC equipment costs disproportionately more per kW due to the additional complexity of the charging equipment — and that doesn’t include the additional maintenance complexity. The table below compares ChargePoint’s cheapest L2 and DCFC units: Charger Power (kW) Price ($) Unit Price ($/kW) ChargePoint CPF50 9.6 kW $1,299 $135/kW ChargePoint CPE250 62.5 kW $52,000 $832/kW In addition to more scalable depot buildouts, reduced charging power can also increase the longevity of the vehicle’s traction battery, which is an important factor in managing vehicle depreciation. Reducing data logging rate Most AV developers start out by logging and uploading all data generated on their vehicles. This makes development easy because the data is always there when you need it. These assumptions need to be broken when a growing fleet generates proportionally more logs, most of which contain routine driving and are not very interesting. We can split the data logged by AVs into two categories: Raw sensor data, such as lidar point clouds, camera images, radar returns, and audio. Derived data, such as detections from the perception system or motion plans from the behavior system. One approach is to keep only one category of data. Retaining only the derived data can still enable debugging of serious incidents, as long as the perception system can be trusted to provide a faithful representation of the raw sensor data. On the other hand, retaining only the raw sensor data makes the logs more useful for developing the mapping and perception system. Similar-looking derived data can be generated by running a replay simulator as needed, but it is challenging to reproduce the exact same outputs as those on the vehicle unless the AV software is fully deterministic. Data retention decisions can also be made temporally. The key challenge here is high-recall classification of which time ranges in the log must be retained. For example, if a DMV-reportable collision occurs, the associated log data must never be discarded. These decisions can happen either on-device or in the cloud, but they must be made without uploading the full log to the cloud, since our bottleneck is the connection from the vehicle to the Internet. Conclusion The current trajectory for scaled autonomous driving would require desirable depot locations to include charging and Internet, making real estate acquisition challenging. There exist opportunities to reduce the additional requirements over time with the goal of making the problem closer to “rent a bunch of conveniently located parking lots.” While these are not traditionally considered autonomous driving problems, solving them will be key to unlocking the next phase of scaling.

a year ago • 87 votes

Large language models are a sustaining innovation for Siri

Many people assume that large language models (LLMs) will disrupt existing consumer voice assistants. Compared to Siri, while today’s ChatGPT is largely unable to complete real-world tasks like hailing an Uber, it’s far better than Siri at understanding and generating language, especially in response to novel requests. From Tom’s Hardware, this captures the sentiment I see among tech commentators: GPT-4o will enable ChatGPT to become a legitimate Siri competitor, with real-time conversations via voice that are responded to instantly without lag time. […] ChatGPT’s new real-time responses make tools like Siri and Echo seem lethargic. And although ChatGPT likely won’t be able to schedule your haircuts like Google Assistant can, it did put up admirable real-time translating chops to challenge Google. Last year, there were rumors that OpenAI was working on its own hardware, which would open the possibility of integrating ChatGPT at the system level along the lines of the Humane Ai Pin. Would such a product be able to mount a successful challenge against Siri, Alexa, and Google Assistant? After Apple’s WWDC keynote yesterday and seeing the updated Siri APIs, I think it’s more likely that LLMs are a sustaining innovation for Siri — a technological innovation that strengthens the position of incumbent voice assistants. Apple promised to increase the number of ways in which Siri can take action in apps. Source: Apple. Traditional voice assistants work by matching the user’s queries to a fixed set of intents. Pierce Freeman explains the general approach: The previous generation of personal assistants had control logic that was largely hard-coded. They revolved around the idea of an intent — a known task that a user wanted to do like sending a message, searching for weather, etc. Detecting this intent might be keyword based or trained into a model that converts a sequence to a one-hot class space. But generally speaking there were discrete tasks and the job of the NLU pipeline was to delegate it to sub-modules. Once the query has been matched to an intent, the next step is to “fill in the blanks” for any inputs needed by the intent: If it believes you’re looking for weather, a sub-module would attempt to detect what city you’re asking about. This motivated a lot of the research into NER (named entity recognition) to detect the more specific objects of interest and map them to real world quantities. city:San Francisco and city:SF to id:4467 for instance. Conversational history was implemented by keeping track of what the user had wanted in previous steps. If a new message is missing some intent, it would assume that a previous message in the flow had a relevant intent. This process of back-detecting the relevant intent was mostly hard-coded or involved a shallow model. A natural outcome is that Apple is forced to develop an expansive and complicated system of intents because it is the only way to expand the assistant’s capabilities. In 2016, Apple also allowed third-party developers to integrate their apps’ functionality by providing intents through SiriKit. Once the developer defines the inputs and outputs, the intents could appear in Siri, the Shortcuts app, proactive notifications, etc. alongside first-party intents by Apple. Similar frameworks exist on other platforms: Android App Actions and Alexa Skills. However, no matter how rich the intent library becomes, the overall user experience can still suffer if access is gated by a brittle intent-matching process: either (1) matching to the incorrect intent, or (2) after matching the correct intent, parsing the request parameters incorrectly. In my opinion, the intent matching stage is the primary source of users’ frustration with Siri. Incorrect named entity recognition by Siri. Contrast this with ChatGPT plugins, a similar system that allows the model to interact with external APIs by determining which plugin might be relevant to the user’s request, then reading the plugin’s API specification to determine the input and output parameters. In other words, the intent matching is performed by an LLM. The generalist nature of LLMs seems to reduce brittleness. For example, when using the code interpreter plugin, the model can write arbitrary Python code and fix resulting runtime errors. The main issue for challengers (OpenAI, Humane, and Rabbit) is the lack of third-party integrations to make their assistants helpful in consumers’ digital lives, extending beyond general knowledge tasks. For example: The Humane Ai Pin only streams music from Tidal, not Spotify nor Apple Music. The Rabbit R1 “large action model” is, in reality, just a few handwritten UI automation scripts for the applications shown in their demo. The system does not appear to generalize to unseen applications. In general, while companies working on UI agents have shown some limited demos, I’m not aware of any that run with high reliability and scale. Even if they achieve scale and generalization, their agents could be made less reliable by app developers using anti-scraping techniques because the developers prefer to own the customer relationship, or as leverage for future partnership negotiations. This type of system is probably a few years out at a minimum. Without a large user base, developers have no incentive to port their apps, leaving the integration work to the platform owner, as in the cases of Humane and Rabbit. Meanwhile, Apple, Amazon, and Google each have a pre-existing app ecosystem. Their position as aggregators means developers are highly motivated to access the enormous installed base of iOS, Alexa, and Android devices. Assuming LLM technology will become a commodity, the incumbents’ in-house LLMs ought to be able to provide decent intent matching and language skills. Combined with an expansive library of intents, it seems very possible that integrating LLMs will cement the incumbent voice assistants as the dominant platforms. Challengers might be better off building products in areas that don’t depend on third-party integrations for key functionality.

a year ago • 79 votes

How autonomous vehicle simulation works

When autonomous vehicle developers justify the safety of their driverless vehicle deployments, they lean heavily on their testing in simulation. Common talking points take the form of “we made our car drive X billion miles in simulation.” From these vague statements, it’s challenging to determine what a simulator is, or how it works. There’s more to simulation than endless driving in a virtual environment. For example, Waymo’s technology overview page says (emphasis mine): We’ve driven more than 20 billion miles in simulation to help identify the most challenging situations our vehicles will encounter on public roads. We can either replay and tweak real-world miles or build completely new virtual scenarios, for our autonomous driving software to practice again and again. Cruise’s safety page contains similar language:1 Before setting out on public roads, Cruise vehicles complete more than 250,000 simulations and closed course testing during everyday and extreme conditions. The main impression one gets from these overviews is that (1) simulation can test many driving scenarios, and (2) everyone will be super impressed if you use it a lot. Going one layer deeper to the few blog posts and talks full of slick GIFs, you might reach the conclusion that simulation is like a video game for the autonomous vehicle in the vein of Grand Theft Auto (GTA): a fully generated 3D environment complete with textures, lighting, and non-player characters (NPCs). Much like human players of GTA, the autonomous vehicle would be able to drive however it likes, freed from real-world consequences. Source: Cruise. While this type of fully synthetic simulation exists in the world of autonomous driving, it’s actually the least commonly used type of simulation.2 Instead, just as a software developer leans on many kinds of testing before releasing an application, an AV developer runs many types of simulation before deploying an autonomous vehicle. Each type of simulation is best suited for a particular use case, with trade-offs between realism, coverage, technical complexity, and cost to operate. In this post, we’ll walk through the system design of a simulator at a hypothetical AV company, starting from first principles. We may never know the details of the actual simulator architecture used by any particular AV developer. However, by exploring the design trade-offs from first principles, I hope to shed some light on how this key system works. Contents Our imaginary self-driving car Replay simulation Interactivity and the pose divergence problem Synthetic simulation The high cost of realistic imagery Round-trip conversions to pixels and back Skipping the sensor data Making smart agents Generating scene descriptions Limitations of pure synthetic simulation Hybrid simulation Conclusion Our imaginary self-driving car Let’s begin by defining our hypothetical autonomous driving software, which will help us illustrate how simulation fits into the development process. Imagine it’s 2015, the peak of self-driving hype, and our team has raised a vast sum of money to develop an autonomous vehicle. Like a human driver, our software drives by continuously performing a few basic tasks: It makes observations about the road and other road users. It reasons about what others might do and plans how it should drive. Finally, it executes those planned motions by steering, accelerating, and braking. Rinse and repeat. This mental model helps us group related code into modules, enabling them to be developed and tested independently. There will be four modules in our system:3 Sensor Interface: Take in raw sensor data such as camera images and lidar point clouds. Sensing: Detect objects such as vehicles, pedestrians, lane lines, and curbs. Behavior: Determine the best trajectory (path) for the vehicle to drive. Vehicle Interface: Convert the trajectory into steering, accelerator, and brake commands to control the vehicle’s drive-by-wire (DBW) system. We connect our modules to each other using an inter-process communication framework (“middleware”) such as ROS, which provides a publish–subscribe system (pubsub) for our modules to talk to each other. Here’s a concrete example of our module-based encapsulation system in action: The sensing module publishes a message containing the positions of other road users. The behavior module subscribes to this message when it wants to know whether there are pedestrians nearby. The behavior module doesn’t know and doesn’t care how the perception module detected those pedestrians; it just needs to see a message that conforms to the agreed-upon API schema. Defining a schema for each message also allows us to store a copy of everything sent through the pubsub system. These driving logs will come in handy for debugging because it allows us to inspect the system with module-level granularity. Our full system looks like this: Simplified architecture diagram for an autonomous vehicle. Now it’s time to take our autonomous vehicle for a spin. We drive around our neighborhood, encountering some scenarios in which our vehicle drives incorrectly, which cause our in-car safety driver to take over driving from the autonomous vehicle. Each disengagement gets reviewed by our engineering team. They analyze the vehicle’s logs and propose some software changes. Now we need a way to prove our changes have actually improved performance. We need the ability to compare the effectiveness of multiple proposed fixes. We need to do this quickly so our engineers can receive timely feedback. We need a simulator! Replay simulation Motivated by the desire to make progress quickly, we try the simplest solution first. The key insight: our software modules don’t care where the incoming messages come from. Could we simulate a past scenario by simply replaying messages from our log as if they were being sent in real time? As the name suggests, this is exactly how replay simulation works. Under normal operation, the input to our software is sensor data captured from real sensors. The simulator replaces this by replaying sensor data from an existing log. Under normal operation, the output of our software is a trajectory (or a set of accelerator and steering commands) that the real car executes. The simulator intercepts the output to control the simulated vehicle’s position instead. Modified architecture diagram for running replay simulation. There are two primary ways we can use this type of simulator, depending on whether we use a different software version as the onroad drive: Different software: By running modified versions of our modules in the simulator, we can get a rough idea of how the changes will affect the vehicle’s behavior. This can provide early feedback on whether a change improves the vehicle’s behavior or successfully fixes a bug. Same software: After a disengagement, we may want to know what would have happened if the autonomous vehicle were allowed to continue driving without human input. Simulation can provide this counterfactual by continuing to play back messages as if the disengagement never happened. We’ve gained these important testing capabilities with relatively little effort. Rather than take on the complexity of a fully generated 3D environment, we got away with a few modifications to our pubsub framework. Interactivity and the pose divergence problem The simplicity of a pure replay simulator also leads to its key weakness: a complete lack of interactivity. Everything in the simulated environment was loaded verbatim from a log. Therefore, the environment does not respond to the simulated vehicle’s behavior, which can lead to unrealistic interactions with other road users. This classic example demonstrates what can happen when the simulated vehicle’s behavior changes too much: Watch on YouTube. Dragomir Anguelov’s guest lecture at MIT. Source: Lex Fridman. Our vehicle, when it drove in the real world, was where the green vehicle is. Now, in simulation, we drove differently and we have the blue vehicle. So we’re driving…bam. What happened? Well, there is a purple agent over there — a pesky purple agent — who, in the real world, saw that we passed them safely. And so it was safe for them to go, but it’s no longer safe, because we changed what we did. So the insight is: in simulation, our actions affect the environment and needed to be accounted for. Anguelov’s video shows the simulated vehicle driving slower than the real vehicle. This kind of problem is called pose divergence, a term that covers any simulation where differences in the simulated vehicle’s driving decisions cause its position to differ from the real-world vehicle’s position. In the video, the pose divergence leads to an unrealistic collision in simulation. A reasonable driver in the purple vehicle’s position would have observed the autonomous vehicle and waited for it to pass before entering the intersection.4 However, in replay simulation, all we can do is play back the other driver’s actions verbatim. In general, problems arising from the lack of interactivity mean the simulated scenario no longer provides useful feedback to the AV developer. This is a pretty serious limitation! The whole point of the simulator is to allow the simulated vehicle to make different driving decisions. If we cannot trust the realism of our simulations anytime there is an interaction with another road user, it rules out a lot of valuable use cases. Synthetic simulation We can solve these interactivity problems by using a simulated environment to generate synthetic inputs that respond to our vehicle’s actions. Creating a synthetic simulation usually starts with a high-level scene description containing: Agents: fully interactive NPCs that react to our vehicle’s behavior. Environments: 3D models of roads, signs, buildings, weather, etc. that can be rendered from any viewpoint. From the scene description, we can generate different types of synthetic inputs for our vehicle to be injected at different layers of its software stack, depending on which modules we want to test. In synthetic sensor simulation, the simulator uses a game engine to render the scene description into fake sensor data, such as camera images, lidar point clouds, and radar returns. The simulator sets up our software modules to receive the generated imagery instead of sensor data logged from real-world driving. Modified architecture diagram for running synthetic simulation with generated sensors. The same game engine can render the scene from any arbitrary perspective, including third-person views. This is how they make all those slick highlight reels. The high cost of realistic imagery Simulations that generate fake sensor data can be quite expensive, both to develop and to run. The developer needs to create a high-quality 3D environment with realistic object models and lighting rivaling AAA games. Example of Cruise’s synthetic simulation showing the same scene rendered into synthetic camera, lidar, and radar data. Source: Cruise. For example, a Cruise blog post mentions some elements of their synthetic simulation roadmap (emphasis mine): With limited time and resources, we have to make choices. For example, we ask how accurately we should model tires, and whether or not it is more important than other factors we have in our queue, like modeling LiDAR reflections off of car windshields and rearview mirrors or correctly modeling radar multipath returns. Even if rendering reflections and translucent surfaces is already well understood in computer graphics, Cruise may still need to make sure their renderer generates realistic reflections that resemble their lidar. This challenge gives a sense of the attention to detail required. It’s only one of many that needs to be solved when building a synthetic sensor simulator. So far, we have only covered the high development costs. Synthetic sensor simulation also incurs high variable costs every time simulation is run. Round-trip conversions to pixels and back By its nature, synthetic sensor simulation performs a round-trip conversion to and from synthetic imagery to test the perception system. The game engine first renders its scene description to synthetic imagery for each sensor on the simulated vehicle, burning many precious GPU-hours in the process, only to have the perception system perform the inverse operation when it detects the objects in the scene to produce the autonomous vehicle’s internal scene representation.5 Every time you launch a synthetic sensor simulation, NVIDIA, Intel, and/or AWS are laughing all the way to the bank. Despite the expense of testing the perception system with synthetic simulation, it is also arguably less effective than testing with real-world imagery paired with ground truth labels. With real imagery, there can be no question about its realism. Synthetic imagery never looks quite right. These practical limitations mean that synthetic sensor simulation ends up as the least used simulator type in AV companies. Usually, it’s also the last type of simulator to be built at a new company. Developers don’t need synthetic imagery most of the time, especially when they have at their disposal a fleet of vehicles that can record the real thing. On the other hand, we cannot easily test risky driving behavior in the real world. For example, it is better to synthesize a bunch of red light runners than try to find them in the real world. This means we are primarily using synthetic simulation to test the behavior system. Skipping the sensor data In synthetic agent simulation, the simulator uses a high-level scene description to generate synthetic outputs from the perception/sensing system. In software development terms, it’s like replacing the perception system with a mock to focus on testing downstream components. This type of simulation requires fewer computational resources to run because the scene description doesn’t need to make a round-trip conversion to sensor data. Modified architecture diagram for running synthetic simulation with generated agents. With image quality out of the picture, the value of synthetic simulation rests solely on the quality of the scenarios it can create. We can split this into two main challenges: designing agents with realistic behaviors generating the scene descriptions containing various agents, street layouts, and environmental conditions Making smart agents You could start developing the control policy for a smart agent similar to NPC design in early video games. A basic smart agent could simply follow a line or a path without reacting to anyone else, which could be used to test the autonomous vehicle’s reaction to a right of way violation. A fancier smart agent could follow a path while also maintaining a safe following distance from the vehicle in front. This type of agent could be placed behind our simulated vehicle, resolving the rear-ending problem mentioned above. Like an audience of demanding gamers, the users of our simulator quickly expect increasingly complex and intelligent behaviors from the smart agents. An ideal smart agent system would capture the full spectrum of every action that other road users could possibly take. This system would also generate realistic behaviors, including realistic-looking trajectories and reaction times, so that we can trust the outcomes of simulations involving smart agents. Finally, our smart agents need to be controllable: they can be given destinations or intents, enabling developers to design simulations that test specific scenarios. Watch on YouTube. Two Cruise simulations in which smart agents (orange boxes) interact with the autonomous vehicle. In the second simulation, two parked cars have been inserted into the bottom of the visualization. Notice how the smart agents and the autonomous vehicle drive differently in the two simulations as they interact with each other and the additional parked cars. Source: Cruise. Developing a great smart agent policy ends up falling in the same difficulty ballpark as developing a great autonomous driving policy. The two systems may even share technical foundations. For example, they may have a shared component that is trained to predict the behaviors of other road users, which can be used for both planning our vehicle’s actions and for generating realistic agents in simulation. Generating scene descriptions Even with the ability to generate realistic synthetic imagery and realistic smart agent behaviors, our synthetic simulation is not complete. We still need a broad and diverse dataset of scene descriptions that can thoroughly test our vehicle. These scene descriptions usually come from a mix of sources: Automatic conversion from onroad scenarios: We can write a program that takes a logged real-world drive, guesses the intent of other road users, and stores those intents as a synthetic simulation scenario. Manual design: Analogous to a level editor in a video game. A human either builds the whole scenario from scratch or makes manual edits to an automatic conversion. For example, a human can design a scenario based on a police report of a human-on-human-driver collision to simulate what the vehicle might have done in that scenario. Generative AI: Recent work from Zoox uses diffusion models trained on a large dataset of onroad scenarios. Example of a real-world log (top) converted to a synthetic simulation scenario, then rendered into synthetic camera images (bottom). Notice how some elements, such as the protest signs, are not carried over, perhaps because they are not supported by the perception system or the scene converter. Source: Cruise. Scenarios can also be fuzzed, where the simulator adds random noise to the scene parameters, such as the speed limit of the road or the goals of simulated agents. This can upsample a small number of converted or manually designed scenes to a larger set that can be used to check for robustness and prevent overfitting. Fuzzing can also help developers understand the space of possible outcomes, as shown in the example below, which fuzzes the reaction time of a synthetic tailgater: An example of fuzzing tailgater reaction time. Source: Waymo. The distribution on the right shows a dot for each variant of the scenario, colored green or red depending on whether a simulated collision occurred. In this experiment, the collision becomes unavoidable once the simulated tailgater’s reaction time exceeds about 1 second. Limitations of pure synthetic simulation With these sources plus fuzzing, we’ve ensured the quantity of scenarios in our library, but we still don’t have any guarantees on the quality. Perhaps the scenarios we (and maybe our generative AI tools) invent are too hard or too easy compared to the distribution of onroad driving our vehicle encounters. If our vehicle drives poorly in a synthetic scenario, does the autonomous driving system need improvement? Or is the scenario unrealistically hard, perhaps because the behavior of its smart agents is too unreasonable? If our vehicle passes with flying colors, is it doing a good job? Or is the scenario library missing some challenging scenarios simply because we did not imagine that they could happen? This is a fundamental problem of pure synthetic simulation. Once we start modifying and fuzzing our simulated scenarios, there isn’t a straightforward way to know whether they remain representative of the real world. And we still need to collect a large quantity of real-world mileage to ensure that we have not missed any rare scenarios. Hybrid simulation We can combine our two types of simulator into a hybrid simulator that takes advantages of the strengths of each, providing an environment that is both realistic and interactive without breaking the bank. From replay simulation, use log replay to ensure every simulated scenario is rooted in a real-world scenario and has perfectly realistic sensor data. From synthetic simulation, make the simulation interactive by selectively replacing other road users with smart agents if they could interact with our vehicle.6 Modified architecture diagram merging parts of replay and synthetic simulation. Hybrid simulation usually serves as the default type of simulation that works well for most use cases. One convenient interpretation is that hybrid simulation is a worry-free replacement for replay simulation: anytime the developer would have used replay, they can absentmindedly switch to hybrid simulation to take care of the most common simulation artifacts while retaining most of the benefits of replay simulation. Conclusion We’ve seen that there are many types of simulation used in autonomous driving. They exist on a spectrum from purely replaying onroad scenarios to fully synthesized environments. The ideal simulation platform allows developers to pick an operating point on that spectrum that fits their use case. Hybrid simulation based on a large volume of real-world miles satisfies most testing needs at a reasonable cost, while fully synthetic modes serve niche use cases that can justify the higher development and operating costs. Cruise has written several deep dives about the usage and scaling of their simulation platform. However, neither Cruise nor Waymo provide many details on the construction of their simulator. ↩ I’ve even heard arguments that it’s only good for making videos. ↩ There exist architectures that are more end-to-end. However, to the best of my knowledge, those systems do not have driverless deployments with nontrivial mileage, making simulation testing less relevant. ↩ Another interactivity problem arises from the replay simulator’s inability to simulate different points of view as the simulated vehicle moves. A large pose divergence often causes the simulated vehicle to drive into an area not observed by the vehicle that produced the onroad log. For example, a simulated vehicle could decide to drive around a corner much earlier. But it wouldn’t be able to see anything until the log data also rounds the corner. No matter where the simulated vehicle drives, it will always be limited to what the logged vehicle saw. ↩ “Computer vision is inverse computer graphics.” ↩ As a nice bonus, because the irrelevant road users are replayed exactly as they drove in real life, this may reduce the compute cost of simulation. ↩

a year ago • 67 votes

Why autonomous trucking is harder than autonomous rideshare

Recently, The Verge asked, “where are all the robot trucks?” It’s a good question. Trucking was supposed to be the ideal first application of autonomous driving. Freeways contain predictable, highly structured driving scenarios. An autonomous truck would not have to deal with the complexities of intersections and two-way traffic. It could easily drive hundreds of miles without encountering a single pedestrian. DALL-E 3 prompt: “Generate an artistic, landscape aspect ratio watercolor painting of a truck with a bright red cab, pulling a white trailer. The truck drives uphill on an empty, rural highway during wintertime, lined with evergreen trees and a snow bank on a foggy, cloudy day.” The trucks could also be commercially viable with only freeway driving capability, or freeways plus a short segment of surface streets needed to reach a transfer hub. The AV company would only need to deal with a limited set of businesses as customers, bypassing the messiness of supporting a large pool of consumers inherent to the B2C model. Autonomous trucks would not be subject to rest requirements. As The Verge notes, “truck operators are allowed to drive a maximum of 11 hours a day and have to take a 30-minute rest after eight consecutive hours behind the wheel. Autonomous trucks would face no such restrictions,” enabling them to provide a service that would be literally unbeatable by a human driver. If you had asked me in 2018, when I first started working in the AV industry, I would’ve bet that driverless trucks would be the first vehicle type to achieve a million-mile driverless deployment. Aurora even pivoted their entire company to trucking in 2020, believing it to be easier than city driving. Yet sitting here in 2024, we know that both Waymo and Cruise have driven millions of miles on city streets — a large portion in the dense urban environment of San Francisco — and there are no driverless truck deployments. What happened? I think the problem is that driverless autonomous trucking is simply harder than driverless rideshare. The trucking problem appears easier at the outset, and indeed many AV developers quickly reach their initial milestones, giving them false confidence. But the difficulty ramps up sharply when the developer starts working on the last bit of polish. They encounter thorny problems related to the high speeds on freeways and trucks’ size, which must be solved before taking the human out of the driver’s seat. What is the driverless bar? Here’s a simplistic framework: No driver in the vehicle. No guarantee of a timely response from remote operators or backend services. Therefore, all safety-critical decisions must be made by the onboard computer alone. Under these constraints, the system still meets or exceeds human safety level. This is a really, really high bar. For example, on surface streets, this means the system on its own is capable of driving at least 100k miles without property damage and 40M miles without fatality.1 The system can still have flaws, but virtually all of those problems must result in a lack of progress, rather than collision or injury. In short, while the system may not know the right thing to do in every scenario, it should never do the wrong thing. (There are several high quality safety frameworks for those interested in a rigorous definition.23 It’s beyond the scope of this post.) Now, let’s look at each aspect of trucking to see how it exacerbates these challenges. Truck-specific challenges Stopping distance vs. sensing range The required sensor capability for an autonomous vehicle is determined by the most challenging scenario that the vehicle needs to handle. A major challenge in trucking is stopping behind a stalled vehicle or large debris in a travel lane. To avoid collision, the autonomous vehicle would need a sensing range greater than or equal to its stopping distance. We’ll make a simplifying assumption that stopping distance defines the minimum detection range requirements. A driverless-quality perception system needs perfect recall on other vehicles within the vehicle’s worst-case stopping distance. Passenger vehicles can decelerate up to –8 m/s². Trucks can only achieve around –4 m/s², which increases the stopping distance and puts the sensing range requirement right at the edge of what today’s sensors can deliver. Here are the sight stopping distances for an empty truck in dry conditions on roads of varying grade:4 Speed (mph) 0% Grade (m) –3% Grade (m) –6% Grade (m) 50 115–141 124–150 136–162 70 122–178 136–162 236–305 Sight stopping distances defined as the distance needed to stop assuming a 2.5-second reaction time with no braking, followed by maximum braking. The distance is computed for an empty truck in dry conditions on roads of varying grade. Stopping distance increases in wet weather or when driving downhill with a load (not shown). Now let’s compare these distances with the capabilities of various sensors: Lidar sensors provide trustworthy 3D data because they take direct measurements based on physical principles. They have a usable range of around 200–250 meters, plenty for city driving but not enough for every truck use case. Lidar detection models may also need to accumulate multiple scans/frames over time to detect faraway objects reliably, especially for smaller items like debris, further decreasing the usable detection range. Note that some solid-state lidars claim significantly more range than 250 meters. These numbers are collected under ideal conditions; for computing minimum sensing capability, we are interested in the range that can provide perfect recall and really great precision. For example, the lidar may be unable to reach its maximum range over the entire field of view, or may require undesirable trade-offs like a scan pattern that reduces point density and field of view to achieve more range. Radar can see farther than lidar. For example, this high-end ZF radar claims vehicle detections up to 350 meters away. Radar is great for tracking moving vehicles, but has trouble distinguishing between stationary vehicles and other background objects. Tesla Autopilot has infamously shown this problem by braking for overpasses and running into stalled vehicles. “Imaging” radars like the ZF device will do better than the radars on production vehicles. They still do not have the azimuth resolution to separate objects beyond 200 meters, where radar input is most needed. Cameras can detect faraway objects as long as there are enough pixels on the object, which leads to the selection of cameras with high resolution and a narrow field of view (telephoto lens). A vehicle will carry multiple narrow cameras for full coverage during turns. However, cameras cannot measure distance or speed directly. A combined camera + radar system using machine learning probably has the best chance here, especially with recent advances in ML-based early fusion, but would need to perform well enough to serve as the primary detection source beyond 200 meters. Training such a model is closer to an open problem than simply receiving that data from a lidar. In summary, we don’t appear to have any sensing solutions with the performance needed for trucks to meet the driverless bar. Controls Controlling a passenger vehicle — determining the amount of steering and throttle input to make the vehicle follow a trajectory — is a simpler problem than controlling a truck. For example, passenger vehicles are generally modeled as a single rigid body, while a truck and its trailer can move separately. The planner and controller need to account for this when making sharp turns and, in extreme low-friction conditions, to avoid jackknifing. These features come in addition to all the usual controls challenges that also apply to passenger vehicles. They can be built but require additional development and validation time. Freeway-specific challenges OK, so trucks are hard, but what about the freeway part? It may now sound appealing to build L4 freeway autonomy for passenger vehicles. However, driving on freeways also brings additional challenges on top of what is needed for city streets. Achieving the minimal risk condition on freeways Autonomous vehicles are supposed to stop when they detect an internal fault or driving situation that they can’t handle. This is called the minimal risk condition (MRC). For example, an autonomous passenger vehicle that detects an error in the HD map or a sensor failure might be programmed to execute a pullover or stop in lane depending on the problem severity. While MRC behaviors are annoying for other road users and embarrassing for the AV developer, they do not add undue risk on surface streets given the low speeds and already chaotic nature of city driving. This gives the AV developer more breathing room (within reason) to deploy a system that does not know how to handle every driving scenario perfectly, but knows enough to stay out of trouble. It’s a different story on the freeway. Stopping in lane becomes much more dangerous with the possibility of a rear-end collision at high speed. All stopping should be planned well in advance, ideally exiting at the next ramp, or at least driving to the closest shoulder with enough room to park. This greatly increases the scope of edge cases that need to be handled autonomously and at freeway speeds. For example: Scene understanding: If the vehicle encounters an unexpected construction zone, crash site, or other non-nominal driving scenario, it’s not enough to detect and stop. Rerouting, while a viable option on surface streets, usually isn’t an option on freeways because it may be difficult or illegal to make a u-turn by the time the vehicle can see the construction. A freeway under construction is also more likely to be the only path to the destination, especially if the autonomous vehicle in question is not designed to drive on city streets. Operational solutions are also not enough for a scaled deployment. AV developers often disallow their vehicles from routing through known problem areas gathered from manually driven scouting vehicles or announcements made by authorities. For a scaled deployment, however, it’s not reasonable to know the status of every mile of road at all times. Therefore, the system needs to find the right path through unstructured scenarios, possibly following instructions from police directing traffic, even if it involves traffic violations such as driving on the wrong side of the road. We know that current state-of-the-art autonomous vehicles still occasionally drive into wet concrete and trenches, which shows it is nontrivial to make a correct decision. Mapping: If the lane lines have been repainted, and the system normally uses an HD map, it needs to ignore the map and build a new one on-the-fly from the perception system’s output. It needs to distinguish between mapping and perception errors. Uptime: Sensor, computer, and software failures need to be virtually eliminated through redundancy and/or engineering elbow grease. The system needs almost perfect uptime. For example, it’s fine to enter a max-braking MRC when losing a sensor or restarting a software module on surface streets, provided those failures are rare. The same maneuver would be dangerous on the freeway, so the failure must be eliminated, or a fallback/redundancy developed. These problems are not impossible to overcome. Every autonomous passenger vehicle has solved them to some extent, with the remaining edge cases punted to some combination of MRC and remote operators. The difference is that, on freeways, they need to be solved with a very high level of reliability to meet the driverless bar. Freeways are boring The features that make freeways simpler — controlled access, no intersections, one-way traffic — also make “interesting” events more rare. This is a double-edged sword. While the simpler environment reduces the number of software features to be developed, it also increases the iteration time and cost. During development, “interesting” events are needed to train data-hungry ML models. For validation, each new software version to be qualified for driverless operation needs to encounter a minimum number of “interesting” events before comparisons to a human safety level can have statistical significance. Overall, iteration becomes more expensive when it takes more vehicle-hours to collect each event. AV developers can only respond by increasing the size of their operations teams or accepting more time between software releases. (Note that simulation is not a perfect solution either. The rarity of events increases vehicle-hours run in simulation, and so far, nobody has shown a substitute for real-world miles in the context of driverless software validation.) Is it ever going to happen? Trucking requires longer range sensing and more complex controls, increasing system complexity and pushing the problem to the bleeding edge of current sensing capabilities. At the same time, driving on freeways brings additional reliability requirements, raising the quality bar on every software component from mapping to scene understanding. If both the truck form factor and the freeway domain increase the level of difficulty, then driverless trucking might be the hardest application of autonomous driving: City Freeway Cars Baseline Harder Trucks Harder Hardest Now that scaled rideshare is mostly working in cities, I expect to see scaled freeway rideshare next. Does this mean driverless trucking will never happen? No, I still believe AV developers will overcome these challenges eventually. Aurora, Kodiak, and Gatik have all promised some form of driverless deployment by the end of the year. We probably won’t see anything close to a million-mile deployment in 2024 though. Getting there will require advances in sensing, machine learning, and a lot of hard work. Thanks to Steven W. and others for the discussions and feedback. This should be considered a bare minimum because humans perform much better on freeways, raising the bar for AVs. Rough numbers taken from Table 3, passenger vehicle national average on surface streets: Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2023). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. arXiv preprint arXiv:2312.13228. (blog) ↩ Kalra, N., & Paddock, S. M. (2016). Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94, 182-193. ↩ Favaro, F., Fraade-Blanar, L., Schnelle, S., Victor, T., Peña, M., Engstrom, J., … & Smith, D. (2023). Building a Credible Case for Safety: Waymo’s Approach for the Determination of Absence of Unreasonable Risk. arXiv preprint arXiv:2306.01917. (blog) ↩ Computed from tables 1 and 2: Harwood, D. W., Glauz, W. D., & Mason, J. M. (1989). Stopping sight distance design for large trucks. Transportation Research Record, 1208, 36-46. ↩

a year ago • 37 votes

How Cruise vehicles return to the garage autonomously in heavy rain

Cruise doesn’t carry passengers in heavy rain. The operational design domain (ODD) in their CPUC permit (PDF) only allows services in light rain. I’ve always wondered how they implement this operationally. For example, Waymo preemptively launches all cars with operators in the driver’s seat anytime there’s rain in the forecast. Cruise has no such policy: I have never seen them assign operators to customer-facing vehicles. Yet Cruise claims to run up to 100 driverless vehicles concurrently. It would be impractical to dispatch a human driver to each vehicle whenever it starts raining. When the latest atmostpheric river hit San Francisco, I knew it was my chance to find out how it worked. Monitoring the Cruise app As the rain intensified, as expected, all cars disappeared from Cruise’s app and the weather pause icon appeared. But then something unusual happened. The app returned to its normal state. A few cars showed up near a hole in the geofence — and they were actually hailable. Visiting the garage I drove over to find that this street is the entrance to one of Cruise’s garages. The same location has been featured in Cruise executives’ past tweets promoting the service.1 Despite the heavy rain and gusts strong enough to blow my hat/jacket off, a steady stream of Cruise vehicles were returning themselves to the garage in driverless mode. Driverless Cruise vehicles enter the garage during heavy rain. A member of Cruise’s operations team enters the vehicle to drive it into the garage. In total, I observed: 8 driverless vehicles 1 manually driven vehicle 1 support vehicle (unmodified Chevy Bolt not capable of autonomous driving) Two vehicles skip the garage After the first six driverless vehicles returned, the next two kept driving past the garage. I followed them in my own car. They drove for about 16 minutes, handling large puddles and road spray without noticeable comfort issues. Eventually they looped back to the garage and successfully entered. A Cruise vehicle drives through a puddle during its detour. I’m not totally sure what happened here. I can think of two reasonable explanations: Boring: The cars missed the turn for some unknown reason. Exciting: Cruise has implemented logic to avoid overwhelming the operations team’s ability to put cars back in the garage. If there are too many vehicles waiting to return, subsequent cars take a detour to kill time instead of blocking the driveway. Key take-aways Cruise is capable of handling heavy rain in driverless mode. The majority of Cruise vehicles returned to the garage autonomously. This enables them to handle correlated events, such as rain, without deploying a large operations team. Cruise may have implemented “take a lap around the block” logic to avoid congestion at the garage entrance. I can’t find the timelapse of Cruise launching their driverless cars anymore. I’m pretty sure it was posted to Twitter. Please let me know if you have the link! Update: Link to tweet by @kvogt. ↩

over a year ago • 47 votes

More in programming

Performant Full-Disk Encryption on a Raspberry Pi, but Foiled by Twisty UARTs

In my post yesterday (“ARM is great, ARM is terrible (and so is RISC-V)), I described my desire to find ARM hardware with AES instructions to support full-disk encryption, and the poor state of the OS ecosystem around the newer ARM boards. I was anticipating buying either a newer ARM SBC or an x86 mini … Continue reading Performant Full-Disk Encryption on a Raspberry Pi, but Foiled by Twisty UARTs →

11 hours ago • 3 votes

Words are not violence

Debates, at their finest, are about exploring topics together in search for truth. That probably sounds hopelessly idealistic to anyone who've ever perused a comment section on the internet, but ideals are there to remind us of what's possible, to inspire us to reach higher — even if reality falls short. I've been reaching for those debating ideals for thirty years on the internet. I've argued with tens of thousands of people, first on Usenet, then in blog comments, then Twitter, now X, and also LinkedIn — as well as a million other places that have come and gone. It's mostly been about technology, but occasionally about society and morality too. There have been plenty of heated moments during those three decades. It doesn't take much for a debate between strangers on this internet to escalate into something far lower than a "search for truth", and I've often felt willing to settle for just a cordial tone! But for the majority of that time, I never felt like things might escalate beyond the keyboards and into the real world. That was until we had our big blow-up at 37signals back in 2021. I suddenly got to see a different darkness from the most vile corners of the internet. Heard from those who seem to prowl for a mob-sanctioned opportunity to threaten and intimidate those they disagree with. It fundamentally changed me. But I used the experience as a mirror to reflect on the ways my own engagement with the arguments occasionally felt too sharp, too personal. And I've since tried to refocus way more of my efforts on the positive and the productive. I'm by no means perfect, and the internet often tempts the worst in us, but I resist better now than I did then. What I cannot come to terms with, though, is the modern equation of words with violence. The growing sense of permission that if the disagreement runs deep enough, then violence is a justified answer to settle it. That sounds so obvious that we shouldn't need to state it in a civil society, but clearly it is not. Not even in technology. Not even in programming. There are plenty of factions here who've taken to justify their violent fantasies by referring to their ideological opponents as "nazis", "fascists", or "racists". And then follow that up with a call to "punch a nazi" or worse. When you hear something like that often enough, it's easy to grow glib about it. That it's just a saying. They don't mean it. But I'm afraid many of them really do. Which brings us to Charlie Kirk. And the technologists who name drinks at their bar after his mortal wound just hours after his death, to name but one of the many, morbid celebrations of the famous conservative debater's death. It's sickening. Deeply, profoundly sickening. And my first instinct was exactly what such people would delight in happening. To watch the rest of us recoil, then retract, and perhaps even eject. To leave the internet for a while or forever. But I can't do that. We shouldn't do that. Instead, we should double down on the opposite. Continue to show up with our ideals held high while we debate strangers in that noble search for the truth. Where we share our excitement, our enthusiasm, and our love of technology, country, and humanity. I think that's what Charlie Kirk did so well. Continued to show up for the debate. Even on hostile territory. Not because he thought he was ever going to convince everyone, but because he knew he'd always reach some with a good argument, a good insight, or at least a different perspective. You could agree or not. Counter or be quiet. But the earnest exploration of the topics in a live exchange with another human is as fundamental to our civilization as Socrates himself. Don't give up, don't give in. Keep debating.

9 hours ago • 2 votes

first-class merges and cover letters

Although it looks really good, I have not yet tried the Jujutsu (jj) version control system, mainly because it’s not yet clearly superior to Magit. But I have been following jj discussions with great interest. One of the things that jj has not yet tackled is how to do better than git refs / branches / tags. As I underestand it, jj currently has something like Mercurial bookmarks, which are more like raw git ref plumbing than a high-level porcelain feature. In particular, jj lacks signed or annotated tags, and it doesn’t have branch names that always automatically refer to the tip. This is clearly a temporary state of affairs because jj is still incomplete and under development and these gaps are going to be filled. But the discussions have led me to think about how git’s branches are unsatisfactory, and what could be done to improve them. branch merge rebase squash fork cover letters previous branch workflow questions branch One of the huge improvements in git compared to Subversion was git’s support for merges. Subversion proudly advertised its support for lightweight branches, but a branch is not very useful if you can’t merge it: an un-mergeable branch is not a tool you can use to help with work-in-progress development. The point of this anecdote is to illustrate that rather than trying to make branches better, we should try to make merges better and branches will get better as a consequence. Let’s consider a few common workflows and how git makes them all unsatisfactory in various ways. Skip to cover letters and previous branch below where I eventually get to the point. merge A basic merge workflow is, create a feature branch hack, hack, review, hack, approve merge back to the trunk The main problem is when it comes to the merge, there may be conflicts due to concurrent work on the trunk. Git encourages you to resolve conflicts while creating the merge commit, which tends to bypass the normal review process. Git also gives you an ugly useless canned commit message for merges, that hides what you did to resolve the conflicts. If the feature branch is a linear record of the work then it can be cluttered with commits to address comments from reviewers and to fix mistakes. Some people like an accurate record of the history, but others prefer the repository to contain clean logical changes that will make sense in years to come, keeping the clutter in the code review system. rebase A rebase-oriented workflow deals with the problems of the merge workflow but introduces new problems. Primarily, rebasing is intended to produce a tidy logical commit history. And when a feature branch is rebased onto the trunk before it is merged, a simple fast-forward check makes it trivial to verify that the merge will be clean (whether it uses separate merge commit or directly fast-forwards the trunk). However, it’s hard to compare the state of the feature branch before and after the rebase. The current and previous tips of the branch (amongst other clutter) are recorded in the reflog of the person who did the rebase, but they can’t share their reflog. A force-push erases the previous branch from the server. Git forges sometimes make it possible to compare a branch before and after a rebase, but it’s usually very inconvenient, which makes it hard to see if review comments have been addressed. And a reviewer can’t fetch past versions of the branch from the server to review them locally. You can mitigate these problems by adding commits in --autosquash format, and delay rebasing until just before merge. However that reintroduces the problem of merge conflicts: if the autosquash doesn’t apply cleanly the branch should have another round of review to make sure the conflicts were resolved OK. squash When the trunk consists of a sequence of merge commits, the --first-parent log is very uninformative. A common way to make the history of the trunk more informative, and deal with the problems of cluttered feature branches and poor rebase support, is to squash the feature branch into a single commit on the trunk instead of mergeing. This encourages merge requests to be roughly the size of one commit, which is arguably a good thing. However, it can be uncomfortably confining for larger features, or cause extra busy-work co-ordinating changes across multiple merge requests. And squashed feature branches have the same merge conflict problem as rebase --autosquash. fork Feature branches can’t always be short-lived. In the past I have maintained local hacks that were used in production but were not (not yet?) suitable to submit upstream. I have tried keeping a stack of these local patches on a git branch that gets rebased onto each upstream release. With this setup the problem of reviewing successive versions of a merge request becomes the bigger problem of keeping track of how the stack of patches evolved over longer periods of time. cover letters Cover letters are common in the email patch workflow that predates git, and they are supported by git format-patch. Github and other forges have a webby version of the cover letter: the message that starts off a pull request or merge request. In git, cover letters are second-class citizens: they aren’t stored in the repository. But many of the problems I outlined above have neat solutions if cover letters become first-class citizens, with a Jujutsu twist. A first-class cover letter starts off as a prototype for a merge request, and becomes the eventual merge commit. Instead of unhelpful auto-generated merge commits, you get helpful and informative messages. No extra work is needed since we’re already writing cover letters. Good merge commit messages make good --first-parent logs. The cover letter subject line works as a branch name. No more need to invent filename-compatible branch names! Jujutsu doesn’t make you name branches, giving them random names instead. It shows the subject line of the topmost commit as a reminder of what the branch is for. If there’s an explicit cover letter the subject line will be a better summary of the branch as a whole. I often find the last commit on a branch is some post-feature cleanup, and that kind of commit has a subject line that is never a good summary of its feature branch. As a prototype for the merge commit, the cover letter can contain the resolution of all the merge conflicts in a way that can be shared and reviewed. In Jujutsu, where conflicts are first class, the cover letter commit can contain unresolved conflicts: you don’t have to clean them up when creating the merge, you can leave that job until later. If you can share a prototype of your merge commit, then it becomes possible for your collaborators to review any merge conflicts and how you resolved them. To distinguish a cover letter from a merge commit object, a cover letter object has a “target” header which is a special kind of parent header. A cover letter also has a normal parent commit header that refers to earlier commits in the feature branch. The target is what will become the first parent of the eventual merge commit. previous branch The other ingredient is to add a “previous branch” header, another special kind of parent commit header. The previous branch header refers to an older version of the cover letter and, transitively, an older version of the whole feature branch. Typically the previous branch header will match the last shared version of the branch, i.e. the commit hash of the server’s copy of the feature branch. The previous branch header isn’t changed during normal work on the feature branch. As the branch is revised and rebased, the commit hash of the cover letter will change fairly frequently. These changes are recorded in git’s reflog or jj’s oplog, but not in the “previous branch” chain. You can use the previous branch chain to examine diffs between versions of the feature branch as a whole. If commits have Gerrit-style or jj-style change-IDs then it’s fairly easy to find and compare previous versions of an individual commit. The previous branch header supports interdiff code review, or allows you to retain past iterations of a patch series. workflow Here are some sketchy notes on how these features might work in practice. One way to use cover letters is jj-style, where it’s convenient to edit commits that aren’t at the tip of a branch, and easy to reshuffle commits so that a branch has a deliberate narrative. When you create a new feature branch, it starts off as an empty cover letter with both target and parent pointing at the same commit. Alternatively, you might start a branch ad hoc, and later cap it with a cover letter. If this is a small change and rebase + fast-forward is allowed, you can edit the “cover letter” to contain the whole change. Otherwise, you can hack on the branch any which way. Shuffle the commits that should be part of the merge request so that they occur before the cover letter, and edit the cover letter to summarize the preceding commits. When you first push the branch, there’s (still) no need to give it a name: the server can see that this is (probably) going to be a new merge request because the top commit has a target branch and its change-ID doesn’t match an existing merge request. Also when you push, your client automatically creates a new instance of your cover letter, adding a “previous branch” header to indicate that the old version was shared. The commits on the branch that were pushed are now immutable; rebases and edits affect the new version of the branch. During review there will typically be multiple iterations of the branch to address feedback. The chain of previous branch headers allows reviewers to see how commits were changed to address feedback, interdiff style. The branch can be merged when the target header matches the current trunk and there are no conflicts left to resolve. When the time comes to merge the branch, there are several options: For a merge workflow, the cover letter is used to make a new commit on the trunk, changing the target header into the first parent commit, and dropping the previous branch header. Or, if you like to preserve more history, the previous branch chain can be retained. Or you can drop the cover letter and fast foward the branch on to the trunk. Or you can squash the branch on to the trunk, using the cover letter as the commit message. questions This is a fairly rough idea: I’m sure that some of the details won’t work in practice without a lot of careful work on compatibility and deployability. Do the new commit headers (“target” and “previous branch”) need to be headers? What are the compatibility issues with adding new headers that refer to other commits? How would a server handle a push of an unnamed branch? How could someone else pull a copy of it? How feasible is it to use cover letter subject lines instead of branch names? The previous branch header is doing a similar job to a remote tracking branch. Is there an opportunity to simplify how we keep a local cache of the server state? Despite all that, I think something along these lines could make branches / reviews / reworks / merges less awkward. How you merge should me a matter of your project’s preferred style, without interference from technical limitations that force you to trade off one annoyance against another. There remains a non-technical limitation: I have assumed that contributors are comfortable enough with version control to use a history-editing workflow effectively. I’ve lost all perspective on how hard this is for a newbie to learn; I expect (or hope?) jj makes it much easier than git rebase.

yesterday • 6 votes

ARM is great, ARM is terrible (and so is RISC-V)

I’ve long been interested in new and different platforms. I ran Debian on an Alpha back in the late 1990s and was part of the Alpha port team; then I helped bootstrap Debian on amd64. I’ve got somewhere around 8 Raspberry Pi devices in active use right now, and the free NNCPNET Internet email service … Continue reading ARM is great, ARM is terrible (and so is RISC-V) →

2 days ago • 4 votes

Many Hard Leetcode Problems are Easy Constraint Problems

In my first interview out of college I was asked the change counter problem: Given a set of coin denominations, find the minimum number of coins required to make change for a given number. IE for USA coinage and 37 cents, the minimum number is four (quarter, dime, 2 pennies). I implemented the simple greedy algorithm and immediately fell into the trap of the question: the greedy algorithm only works for "well-behaved" denominations. If the coin values were [10, 9, 1], then making 37 cents would take 10 coins in the greedy algorithm but only 4 coins optimally (10+9+9+9). The "smart" answer is to use a dynamic programming algorithm, which I didn't know how to do. So I failed the interview. But you only need dynamic programming if you're writing your own algorithm. It's really easy if you throw it into a constraint solver like MiniZinc and call it a day. int: total; array[int] of int: values = [10, 9, 1]; array[index_set(values)] of var 0..: coins; constraint sum (c in index_set(coins)) (coins[c] * values[c]) == total; solve minimize sum(coins); You can try this online here. It'll give you a prompt to put in total and then give you successively-better solutions: coins = [0, 0, 37]; ---------- coins = [0, 1, 28]; ---------- coins = [0, 2, 19]; ---------- coins = [0, 3, 10]; ---------- coins = [0, 4, 1]; ---------- coins = [1, 3, 0]; ---------- Lots of similar interview questions are this kind of mathematical optimization problem, where we have to find the maximum or minimum of a function corresponding to constraints. They're hard in programming languages because programming languages are too low-level. They are also exactly the problems that constraint solvers were designed to solve. Hard leetcode problems are easy constraint problems.1 Here I'm using MiniZinc, but you could just as easily use Z3 or OR-Tools or whatever your favorite generalized solver is. More examples This was a question in a different interview (which I thankfully passed): Given a list of stock prices through the day, find maximum profit you can get by buying one stock and selling one stock later. It's easy to do in O(n^2) time, or if you are clever, you can do it in O(n). Or you could be not clever at all and just write it as a constraint problem: array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; var int: buy; var int: sell; var int: profit = prices[sell] - prices[buy]; constraint sell > buy; constraint profit > 0; solve maximize profit; Reminder, link to trying it online here. While working at that job, one interview question we tested out was: Given a list, determine if three numbers in that list can be added or subtracted to give 0? This is a satisfaction problem, not a constraint problem: we don't need the "best answer", any answer will do. We eventually decided against it for being too tricky for the engineers we were targeting. But it's not tricky in a solver; include "globals.mzn"; array[int] of int: numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array[index_set(numbers)] of var {0, -1, 1}: choices; constraint sum(n in index_set(numbers)) (numbers[n] * choices[n]) = 0; constraint count(choices, -1) + count(choices, 1) = 3; solve satisfy; Okay, one last one, a problem I saw last year at Chipy AlgoSIG. Basically they pick some leetcode problems and we all do them. I failed to solve this one: Given an array of integers heights representing the histogram's bar height where the width of each bar is 1, return the area of the largest rectangle in the histogram. The "proper" solution is a tricky thing involving tracking lots of bookkeeping states, which you can completely bypass by expressing it as constraints: array[int] of int: numbers = [2,1,5,6,2,3]; var 1..length(numbers): x; var 1..length(numbers): dx; var 1..: y; constraint x + dx <= length(numbers); constraint forall (i in x..(x+dx)) (y <= numbers[i]); var int: area = (dx+1)*y; solve maximize area; output ["(\(x)->\(x+dx))*\(y) = \(area)"] There's even a way to automatically visualize the solution (using vis_geost_2d), but I didn't feel like figuring it out in time for the newsletter. Is this better? Now if I actually brought these questions to an interview the interviewee could ruin my day by asking "what's the runtime complexity?" Constraint solvers runtimes are unpredictable and almost always than an ideal bespoke algorithm because they are more expressive, in what I refer to as the capability/tractability tradeoff. But even so, they'll do way better than a bad bespoke algorithm, and I'm not experienced enough in handwriting algorithms to consistently beat a solver. The real advantage of solvers, though, is how well they handle new constraints. Take the stock picking problem above. I can write an O(n²) algorithm in a few minutes and the O(n) algorithm if you give me some time to think. Now change the problem to Maximize the profit by buying and selling up to max_sales stocks, but you can only buy or sell one stock at a given time and you can only hold up to max_hold stocks at a time? That's a way harder problem to write even an inefficient algorithm for! While the constraint problem is only a tiny bit more complicated: include "globals.mzn"; int: max_sales = 3; int: max_hold = 2; array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array [1..max_sales] of var int: buy; array [1..max_sales] of var int: sell; array [index_set(prices)] of var 0..max_hold: stocks_held; var int: profit = sum(s in 1..max_sales) (prices[sell[s]] - prices[buy[s]]); constraint forall (s in 1..max_sales) (sell[s] > buy[s]); constraint profit > 0; constraint forall(i in index_set(prices)) (stocks_held[i] = (count(s in 1..max_sales) (buy[s] <= i) - count(s in 1..max_sales) (sell[s] <= i))); constraint alldifferent(buy ++ sell); solve maximize profit; output ["buy at \(buy)\n", "sell at \(sell)\n", "for \(profit)"]; Most constraint solving examples online are puzzles, like Sudoku or "SEND + MORE = MONEY". Solving leetcode problems would be a more interesting demonstration. And you get more interesting opportunities to teach optimizations, like symmetry breaking. Because my dad will email me if I don't explain this: "leetcode" is slang for "tricky algorithmic interview questions that have little-to-no relevance in the actual job you're interviewing for." It's from leetcode.com. ↩

2 days ago • 6 votes

New here?