Large language models are a sustaining innovation for Siri

from Kevin Chen [alt+shift+b] in programming

Many people assume that large language models (LLMs) will disrupt existing consumer voice assistants. Compared to Siri, while today’s ChatGPT is largely unable to complete real-world tasks like hailing an Uber, it’s far better than Siri at understanding and generating language, especially in response to novel requests. From Tom’s Hardware, this captures the sentiment I see among tech commentators: GPT-4o will enable ChatGPT to become a legitimate Siri competitor, with real-time conversations via voice that are responded to instantly without lag time. […] ChatGPT’s new real-time responses make tools like Siri and Echo seem lethargic. And although ChatGPT likely won’t be able to schedule your haircuts like Google Assistant can, it did put up admirable real-time translating chops to challenge Google. Last year, there were rumors that OpenAI was working on its own hardware, which would open the possibility of integrating ChatGPT at the system level along the lines of the Humane Ai Pin....

a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Kevin Chen

Real estate is one of the hardest open problems in scaled self driving

I’ve had a minor obsession with Waymo’s autonomous vehicle depots recently. Over the past few months, I’ve flown a drone as part of a stakeout to understand how they work. And I’ve taken a deep dive into an apparent Waymo outage to find the company charging its electric vehicles from temporary diesel generators. The reason for my obsession? I believe depot buildouts will be one of the last hard problems in scaled autonomous driving. Long after the hardware, software, and AI have been perfected, real estate acquisition will remain a limiting factor in large-scale AV deployment. Waymo’s main depot at 201 Toland Street, San Francisco. Will self driving follow software scaling laws? In 2021, Elon Musk claimed that Tesla FSD’s release will be “one of the biggest asset value increases in history.” The day FSD goes to wide release will be one of the biggest asset value increases in history — Elon Musk (@elonmusk) October 20, 2021 Musk is arguing that, once autonomous driving has been solved, it can be instantly rolled out at the push of a button. Nearly all of Tesla’s fleet could be put to productive use without humans behind the wheel. While Musk’s viewpoint is on the extreme end, it’s a sentiment shared by many who have worked on or invested in autonomous driving over the years. Once you have hardware capable of supporting safe driverless operation, it’s just a matter of developing the right software. Software can be replicated infinitely at zero marginal cost. Could autonomous driving therefore scale as quickly as software platforms like Uber or DoorDash? The answer is not so simple. Self-driving cars are still cars — cars that exist in the physical world and need to be parked, fueled, cleaned, and repaired. Uber and other multi-sided marketplace platforms have been able to grow exponentially because they distribute these responsibilities to the individual drivers — many Uber drivers park at their own homes — allowing the platform provider to focus on developing the software pieces. So far, AV companies like Waymo and Cruise have taken a different approach. They’ve preferred to centralize these operational tasks in large depots staffed with their own personnel. This is because AV technology is still maturing and cannot be easily productized in the short term. Additionally, Timothy B. Lee notes in Understanding AI that “having hardware, software, and support services all under one roof makes it easier for Waymo to experiment with different technologies and business models.” When the kinks are still being worked out, it is more straightforward to vertically integrate everything in a single organization. The many jobs to be done of a robotaxi depot Depots for human-driven fleets, such as rental cars or delivery vans, only require a parking area with minimal additional infrastructure. This enables a fairly straightforward trade-off between location and cost: the fleet operator seeks a location close to customer demand while minimizing rent. For example, a logistics company participating in Amazon’s Delivery Service Partner program can run its depot from any sufficiently cheap parking lot near the local Amazon warehouse. The same constraints affect depot selection for autonomous vehicles. However, the depot also needs to be more than just a convenient parking lot to store off-duty cars. Because AVs are often also EVs, the ideal site also has electric vehicle charging. Because AVs need to upload driving logs to the cloud, it should have a high-speed Internet connection too. Let’s explore these constraints in detail. Location Depots should be placed close to customer demand to minimize deadheading (non-revenue driving), which would raise costs while degrading the customer experience with longer pickup times. Ideal depots are therefore located in desirable residential or commercial areas, where there is more competition among potential tenants. Placing depots in high-demand neighborhoods instead of industrial areas can also increase the probability of local opposition. Waymo has already encountered opposition during a proposed expansion of their main depot, even though it is located in an industrial neighborhood with many similar facilities. Again from Timothy B. Lee: Waymo sought a permit to convert the warehouse next door into some office space and a parking lot for Waymo employees. San Francisco’s Board of Supervisors unanimously rejected Waymo’s application. The rejection was partly based on fears that Waymo would eventually use the space to launch a delivery service in the city (Waymo hasn’t announced any plans to do this so far). But it also reflected city leaders’ frustration with their general lack of power over Waymo. Now consider the recent incident in which driverless Waymo vehicles honked at each other while entering a depot near residential buildings in San Francisco, often well into the early hours of the morning. While residents and the company resolved the situation amicably, it will surely be raised in future discussions of new Waymo depots in residential areas should they come before the Board of Supervisors or Planning Commission. Electric vehicle charging AV developers have preferred to run their services with electric vehicles. Although AV and EV technologies are not inherently coupled, running a fully electric fleet adds an environmental angle to the AV sales pitch, allowing the companies to claim that AV rides reduce emissions by displacing gas-powered driving. An EV fleet also lowers vehicle maintenance costs. Waymo and Cruise each have locations with DC fast charging capability. This approach avoids relying on public chargers. Taking Waymo’s primary San Francisco depot as an example, the company installed 38 chargers of approximately 60 kW each, implying a total site power of around 2.4 MW. Waymo vehicles charging in San Francisco. Approximately one-third of parking spots in the main depot have charging. Bringing in so many high-power chargers likely added significant complexity to Waymo’s depot construction. While we don’t Waymo’s process, we have a fairly good benchmark from the Tesla community, which tracks Tesla Supercharger installations closely. From Bruce Mah, a seasoned EV charging observer, the construction process is: As with any construction project, things usually start with selecting a site and permitting. There will often be some demolition / excavation of part of a parking lot (Superchargers are often built in existing parking lots). Tesla equipment such as charging cabinets, posts, etc. will usually be installed next (see T1 below). Eventually there will be some inspections from the local Authority Having Jurisdiction (AHJ). A utility transformer (from PG&E, SCE, etc.) is usually the last piece of equipment to be installed. Repaving, painting, and installation of parking stops will also usually happen late in the process, as well as landscaping and lighting enhancements. Of these steps, permitting and utility work are not within the charging operator’s control. California municipalities, especially San Francisco, have a notoriously slow and political permitting process. With PG&E, the utility serving much of the state, electrical service upgrades involving a new distribution transformer can take months. Timelines aside, building out a charging site is also expensive. For example, an agreement between Tesla and the City of West Hollywood values an eight-plug location at $482,942 for both equipment and construction. Data offload The final piece of the puzzle is data offload. Autonomous vehicles log vast amounts of data as they drive, measured in hundreds of GBs to TBs per hour. Some of the data is subject to mandatory retention and must be uploaded for later review. At a minimum, all AV collisions in California must be reported to the DMV. Regulators at all levels of government expect the AV developer to present analyses of serious incidents, including recordings from the vehicle and explanations of the AV’s decisions. In addition to regulatory requirements, the AV developer often wants to return much more data for engineering purposes: near misses, stuck events, novel or interesting scenarios, and more. The upshot is that the AV operator needs to upload a substantial portion of the hundreds of GBs to TBs logged per hour of driving. Uploading over cellular networks would not be cost effective. These transfers must occur at a depot. Today, it’s likely that Waymo and Cruise use disk swapping for data offload. When a car fills up its internal logging disk, it notifies an operator to plug in a fresh one. The full disks may be uploaded directly from the depot or shipped to a datacenter. This whole process is labor intensive and, over time, may pose a reliability concern due to dust or water ingress. A Waymo operator performs a possible disk swap. Many AV developers are moving toward direct data transfer from the vehicle using Ethernet, Wi-Fi, or a private 5G network, which reduces the number of manual touch points and moving parts. Charging is a great time to perform these transfers. However, this imposes an additional requirement on the depot: a fast upload speed, probably a fiber connection of at least 10 Gbps. Where do we go from here? When we put all three requirements together (great location, high-power EV charging, and high-speed Internet), there may be few to none sites that fit the bill. This would require the AV operator to take on site-specific construction projects to add amenities like charging and Internet — a strategy that sits in direct opposition to rapid and cost-effective scaling. Another possibility is to engineer ways to relax the constraints. Decoupling the requirements Waymo and Cruise do not require all of their locations to have charging and data offload. For example, Waymo operates satellite lots in downtown San Francisco only for storing their off-hail vehicles. Every night, fleet management software instructs the cars to travel back to the main depot for charging and data offload. A Waymo satellite location in San Francisco with minimal staffing, no charging, and apparently no data offloading. This solution works as long as the total charging and data transfer capacity across all locations exceeds the average throughput required to keep the fleet in working order. However, the lack of redundancy can lead to cascading failures, such as the apparent power outage at Waymo’s main depot that led the company to shut down many vehicles during a Friday evening rush hour. Reducing charging power Waymo and Cruise currently use DC fast charging (DCFC) for their fleets. Level 2 (L2) or AC charging could reduce the cost of buildouts because the equipment is cheaper and can often be added without bringing a new utility transformer. This could enable overnight charging in satellite locations that do not currently have any charging capacity. Imagine an operator showing up to plug in all the cars at night when there is little demand, then returning in the morning to unplug them. There is an order of magnitude speed difference between L2 and DCFC. This is important for consumer charging, where the consumer cares about the time to get a single car back on the road. However, charging power for any individual car becomes less important when charging a large fleet. Fleet operators care about the total throughput of turning around cars, which is proportional to total power delivered across all chargers. In other words, assuming an autonomous ride hailing service will always have overnight lulls in demand and enough parking spots during those times, the most scalable strategy is to procure your desired total charging power at the lowest price. DCFC equipment costs disproportionately more per kW due to the additional complexity of the charging equipment — and that doesn’t include the additional maintenance complexity. The table below compares ChargePoint’s cheapest L2 and DCFC units: Charger Power (kW) Price ($) Unit Price ($/kW) ChargePoint CPF50 9.6 kW $1,299 $135/kW ChargePoint CPE250 62.5 kW $52,000 $832/kW In addition to more scalable depot buildouts, reduced charging power can also increase the longevity of the vehicle’s traction battery, which is an important factor in managing vehicle depreciation. Reducing data logging rate Most AV developers start out by logging and uploading all data generated on their vehicles. This makes development easy because the data is always there when you need it. These assumptions need to be broken when a growing fleet generates proportionally more logs, most of which contain routine driving and are not very interesting. We can split the data logged by AVs into two categories: Raw sensor data, such as lidar point clouds, camera images, radar returns, and audio. Derived data, such as detections from the perception system or motion plans from the behavior system. One approach is to keep only one category of data. Retaining only the derived data can still enable debugging of serious incidents, as long as the perception system can be trusted to provide a faithful representation of the raw sensor data. On the other hand, retaining only the raw sensor data makes the logs more useful for developing the mapping and perception system. Similar-looking derived data can be generated by running a replay simulator as needed, but it is challenging to reproduce the exact same outputs as those on the vehicle unless the AV software is fully deterministic. Data retention decisions can also be made temporally. The key challenge here is high-recall classification of which time ranges in the log must be retained. For example, if a DMV-reportable collision occurs, the associated log data must never be discarded. These decisions can happen either on-device or in the cloud, but they must be made without uploading the full log to the cloud, since our bottleneck is the connection from the vehicle to the Internet. Conclusion The current trajectory for scaled autonomous driving would require desirable depot locations to include charging and Internet, making real estate acquisition challenging. There exist opportunities to reduce the additional requirements over time with the goal of making the problem closer to “rent a bunch of conveniently located parking lots.” While these are not traditionally considered autonomous driving problems, solving them will be key to unlocking the next phase of scaling.

10 months ago • 76 votes

How autonomous vehicle simulation works

When autonomous vehicle developers justify the safety of their driverless vehicle deployments, they lean heavily on their testing in simulation. Common talking points take the form of “we made our car drive X billion miles in simulation.” From these vague statements, it’s challenging to determine what a simulator is, or how it works. There’s more to simulation than endless driving in a virtual environment. For example, Waymo’s technology overview page says (emphasis mine): We’ve driven more than 20 billion miles in simulation to help identify the most challenging situations our vehicles will encounter on public roads. We can either replay and tweak real-world miles or build completely new virtual scenarios, for our autonomous driving software to practice again and again. Cruise’s safety page contains similar language:1 Before setting out on public roads, Cruise vehicles complete more than 250,000 simulations and closed course testing during everyday and extreme conditions. The main impression one gets from these overviews is that (1) simulation can test many driving scenarios, and (2) everyone will be super impressed if you use it a lot. Going one layer deeper to the few blog posts and talks full of slick GIFs, you might reach the conclusion that simulation is like a video game for the autonomous vehicle in the vein of Grand Theft Auto (GTA): a fully generated 3D environment complete with textures, lighting, and non-player characters (NPCs). Much like human players of GTA, the autonomous vehicle would be able to drive however it likes, freed from real-world consequences. Source: Cruise. While this type of fully synthetic simulation exists in the world of autonomous driving, it’s actually the least commonly used type of simulation.2 Instead, just as a software developer leans on many kinds of testing before releasing an application, an AV developer runs many types of simulation before deploying an autonomous vehicle. Each type of simulation is best suited for a particular use case, with trade-offs between realism, coverage, technical complexity, and cost to operate. In this post, we’ll walk through the system design of a simulator at a hypothetical AV company, starting from first principles. We may never know the details of the actual simulator architecture used by any particular AV developer. However, by exploring the design trade-offs from first principles, I hope to shed some light on how this key system works. Contents Our imaginary self-driving car Replay simulation Interactivity and the pose divergence problem Synthetic simulation The high cost of realistic imagery Round-trip conversions to pixels and back Skipping the sensor data Making smart agents Generating scene descriptions Limitations of pure synthetic simulation Hybrid simulation Conclusion Our imaginary self-driving car Let’s begin by defining our hypothetical autonomous driving software, which will help us illustrate how simulation fits into the development process. Imagine it’s 2015, the peak of self-driving hype, and our team has raised a vast sum of money to develop an autonomous vehicle. Like a human driver, our software drives by continuously performing a few basic tasks: It makes observations about the road and other road users. It reasons about what others might do and plans how it should drive. Finally, it executes those planned motions by steering, accelerating, and braking. Rinse and repeat. This mental model helps us group related code into modules, enabling them to be developed and tested independently. There will be four modules in our system:3 Sensor Interface: Take in raw sensor data such as camera images and lidar point clouds. Sensing: Detect objects such as vehicles, pedestrians, lane lines, and curbs. Behavior: Determine the best trajectory (path) for the vehicle to drive. Vehicle Interface: Convert the trajectory into steering, accelerator, and brake commands to control the vehicle’s drive-by-wire (DBW) system. We connect our modules to each other using an inter-process communication framework (“middleware”) such as ROS, which provides a publish–subscribe system (pubsub) for our modules to talk to each other. Here’s a concrete example of our module-based encapsulation system in action: The sensing module publishes a message containing the positions of other road users. The behavior module subscribes to this message when it wants to know whether there are pedestrians nearby. The behavior module doesn’t know and doesn’t care how the perception module detected those pedestrians; it just needs to see a message that conforms to the agreed-upon API schema. Defining a schema for each message also allows us to store a copy of everything sent through the pubsub system. These driving logs will come in handy for debugging because it allows us to inspect the system with module-level granularity. Our full system looks like this: Simplified architecture diagram for an autonomous vehicle. Now it’s time to take our autonomous vehicle for a spin. We drive around our neighborhood, encountering some scenarios in which our vehicle drives incorrectly, which cause our in-car safety driver to take over driving from the autonomous vehicle. Each disengagement gets reviewed by our engineering team. They analyze the vehicle’s logs and propose some software changes. Now we need a way to prove our changes have actually improved performance. We need the ability to compare the effectiveness of multiple proposed fixes. We need to do this quickly so our engineers can receive timely feedback. We need a simulator! Replay simulation Motivated by the desire to make progress quickly, we try the simplest solution first. The key insight: our software modules don’t care where the incoming messages come from. Could we simulate a past scenario by simply replaying messages from our log as if they were being sent in real time? As the name suggests, this is exactly how replay simulation works. Under normal operation, the input to our software is sensor data captured from real sensors. The simulator replaces this by replaying sensor data from an existing log. Under normal operation, the output of our software is a trajectory (or a set of accelerator and steering commands) that the real car executes. The simulator intercepts the output to control the simulated vehicle’s position instead. Modified architecture diagram for running replay simulation. There are two primary ways we can use this type of simulator, depending on whether we use a different software version as the onroad drive: Different software: By running modified versions of our modules in the simulator, we can get a rough idea of how the changes will affect the vehicle’s behavior. This can provide early feedback on whether a change improves the vehicle’s behavior or successfully fixes a bug. Same software: After a disengagement, we may want to know what would have happened if the autonomous vehicle were allowed to continue driving without human input. Simulation can provide this counterfactual by continuing to play back messages as if the disengagement never happened. We’ve gained these important testing capabilities with relatively little effort. Rather than take on the complexity of a fully generated 3D environment, we got away with a few modifications to our pubsub framework. Interactivity and the pose divergence problem The simplicity of a pure replay simulator also leads to its key weakness: a complete lack of interactivity. Everything in the simulated environment was loaded verbatim from a log. Therefore, the environment does not respond to the simulated vehicle’s behavior, which can lead to unrealistic interactions with other road users. This classic example demonstrates what can happen when the simulated vehicle’s behavior changes too much: Watch on YouTube. Dragomir Anguelov’s guest lecture at MIT. Source: Lex Fridman. Our vehicle, when it drove in the real world, was where the green vehicle is. Now, in simulation, we drove differently and we have the blue vehicle. So we’re driving…bam. What happened? Well, there is a purple agent over there — a pesky purple agent — who, in the real world, saw that we passed them safely. And so it was safe for them to go, but it’s no longer safe, because we changed what we did. So the insight is: in simulation, our actions affect the environment and needed to be accounted for. Anguelov’s video shows the simulated vehicle driving slower than the real vehicle. This kind of problem is called pose divergence, a term that covers any simulation where differences in the simulated vehicle’s driving decisions cause its position to differ from the real-world vehicle’s position. In the video, the pose divergence leads to an unrealistic collision in simulation. A reasonable driver in the purple vehicle’s position would have observed the autonomous vehicle and waited for it to pass before entering the intersection.4 However, in replay simulation, all we can do is play back the other driver’s actions verbatim. In general, problems arising from the lack of interactivity mean the simulated scenario no longer provides useful feedback to the AV developer. This is a pretty serious limitation! The whole point of the simulator is to allow the simulated vehicle to make different driving decisions. If we cannot trust the realism of our simulations anytime there is an interaction with another road user, it rules out a lot of valuable use cases. Synthetic simulation We can solve these interactivity problems by using a simulated environment to generate synthetic inputs that respond to our vehicle’s actions. Creating a synthetic simulation usually starts with a high-level scene description containing: Agents: fully interactive NPCs that react to our vehicle’s behavior. Environments: 3D models of roads, signs, buildings, weather, etc. that can be rendered from any viewpoint. From the scene description, we can generate different types of synthetic inputs for our vehicle to be injected at different layers of its software stack, depending on which modules we want to test. In synthetic sensor simulation, the simulator uses a game engine to render the scene description into fake sensor data, such as camera images, lidar point clouds, and radar returns. The simulator sets up our software modules to receive the generated imagery instead of sensor data logged from real-world driving. Modified architecture diagram for running synthetic simulation with generated sensors. The same game engine can render the scene from any arbitrary perspective, including third-person views. This is how they make all those slick highlight reels. The high cost of realistic imagery Simulations that generate fake sensor data can be quite expensive, both to develop and to run. The developer needs to create a high-quality 3D environment with realistic object models and lighting rivaling AAA games. Example of Cruise’s synthetic simulation showing the same scene rendered into synthetic camera, lidar, and radar data. Source: Cruise. For example, a Cruise blog post mentions some elements of their synthetic simulation roadmap (emphasis mine): With limited time and resources, we have to make choices. For example, we ask how accurately we should model tires, and whether or not it is more important than other factors we have in our queue, like modeling LiDAR reflections off of car windshields and rearview mirrors or correctly modeling radar multipath returns. Even if rendering reflections and translucent surfaces is already well understood in computer graphics, Cruise may still need to make sure their renderer generates realistic reflections that resemble their lidar. This challenge gives a sense of the attention to detail required. It’s only one of many that needs to be solved when building a synthetic sensor simulator. So far, we have only covered the high development costs. Synthetic sensor simulation also incurs high variable costs every time simulation is run. Round-trip conversions to pixels and back By its nature, synthetic sensor simulation performs a round-trip conversion to and from synthetic imagery to test the perception system. The game engine first renders its scene description to synthetic imagery for each sensor on the simulated vehicle, burning many precious GPU-hours in the process, only to have the perception system perform the inverse operation when it detects the objects in the scene to produce the autonomous vehicle’s internal scene representation.5 Every time you launch a synthetic sensor simulation, NVIDIA, Intel, and/or AWS are laughing all the way to the bank. Despite the expense of testing the perception system with synthetic simulation, it is also arguably less effective than testing with real-world imagery paired with ground truth labels. With real imagery, there can be no question about its realism. Synthetic imagery never looks quite right. These practical limitations mean that synthetic sensor simulation ends up as the least used simulator type in AV companies. Usually, it’s also the last type of simulator to be built at a new company. Developers don’t need synthetic imagery most of the time, especially when they have at their disposal a fleet of vehicles that can record the real thing. On the other hand, we cannot easily test risky driving behavior in the real world. For example, it is better to synthesize a bunch of red light runners than try to find them in the real world. This means we are primarily using synthetic simulation to test the behavior system. Skipping the sensor data In synthetic agent simulation, the simulator uses a high-level scene description to generate synthetic outputs from the perception/sensing system. In software development terms, it’s like replacing the perception system with a mock to focus on testing downstream components. This type of simulation requires fewer computational resources to run because the scene description doesn’t need to make a round-trip conversion to sensor data. Modified architecture diagram for running synthetic simulation with generated agents. With image quality out of the picture, the value of synthetic simulation rests solely on the quality of the scenarios it can create. We can split this into two main challenges: designing agents with realistic behaviors generating the scene descriptions containing various agents, street layouts, and environmental conditions Making smart agents You could start developing the control policy for a smart agent similar to NPC design in early video games. A basic smart agent could simply follow a line or a path without reacting to anyone else, which could be used to test the autonomous vehicle’s reaction to a right of way violation. A fancier smart agent could follow a path while also maintaining a safe following distance from the vehicle in front. This type of agent could be placed behind our simulated vehicle, resolving the rear-ending problem mentioned above. Like an audience of demanding gamers, the users of our simulator quickly expect increasingly complex and intelligent behaviors from the smart agents. An ideal smart agent system would capture the full spectrum of every action that other road users could possibly take. This system would also generate realistic behaviors, including realistic-looking trajectories and reaction times, so that we can trust the outcomes of simulations involving smart agents. Finally, our smart agents need to be controllable: they can be given destinations or intents, enabling developers to design simulations that test specific scenarios. Watch on YouTube. Two Cruise simulations in which smart agents (orange boxes) interact with the autonomous vehicle. In the second simulation, two parked cars have been inserted into the bottom of the visualization. Notice how the smart agents and the autonomous vehicle drive differently in the two simulations as they interact with each other and the additional parked cars. Source: Cruise. Developing a great smart agent policy ends up falling in the same difficulty ballpark as developing a great autonomous driving policy. The two systems may even share technical foundations. For example, they may have a shared component that is trained to predict the behaviors of other road users, which can be used for both planning our vehicle’s actions and for generating realistic agents in simulation. Generating scene descriptions Even with the ability to generate realistic synthetic imagery and realistic smart agent behaviors, our synthetic simulation is not complete. We still need a broad and diverse dataset of scene descriptions that can thoroughly test our vehicle. These scene descriptions usually come from a mix of sources: Automatic conversion from onroad scenarios: We can write a program that takes a logged real-world drive, guesses the intent of other road users, and stores those intents as a synthetic simulation scenario. Manual design: Analogous to a level editor in a video game. A human either builds the whole scenario from scratch or makes manual edits to an automatic conversion. For example, a human can design a scenario based on a police report of a human-on-human-driver collision to simulate what the vehicle might have done in that scenario. Generative AI: Recent work from Zoox uses diffusion models trained on a large dataset of onroad scenarios. Example of a real-world log (top) converted to a synthetic simulation scenario, then rendered into synthetic camera images (bottom). Notice how some elements, such as the protest signs, are not carried over, perhaps because they are not supported by the perception system or the scene converter. Source: Cruise. Scenarios can also be fuzzed, where the simulator adds random noise to the scene parameters, such as the speed limit of the road or the goals of simulated agents. This can upsample a small number of converted or manually designed scenes to a larger set that can be used to check for robustness and prevent overfitting. Fuzzing can also help developers understand the space of possible outcomes, as shown in the example below, which fuzzes the reaction time of a synthetic tailgater: An example of fuzzing tailgater reaction time. Source: Waymo. The distribution on the right shows a dot for each variant of the scenario, colored green or red depending on whether a simulated collision occurred. In this experiment, the collision becomes unavoidable once the simulated tailgater’s reaction time exceeds about 1 second. Limitations of pure synthetic simulation With these sources plus fuzzing, we’ve ensured the quantity of scenarios in our library, but we still don’t have any guarantees on the quality. Perhaps the scenarios we (and maybe our generative AI tools) invent are too hard or too easy compared to the distribution of onroad driving our vehicle encounters. If our vehicle drives poorly in a synthetic scenario, does the autonomous driving system need improvement? Or is the scenario unrealistically hard, perhaps because the behavior of its smart agents is too unreasonable? If our vehicle passes with flying colors, is it doing a good job? Or is the scenario library missing some challenging scenarios simply because we did not imagine that they could happen? This is a fundamental problem of pure synthetic simulation. Once we start modifying and fuzzing our simulated scenarios, there isn’t a straightforward way to know whether they remain representative of the real world. And we still need to collect a large quantity of real-world mileage to ensure that we have not missed any rare scenarios. Hybrid simulation We can combine our two types of simulator into a hybrid simulator that takes advantages of the strengths of each, providing an environment that is both realistic and interactive without breaking the bank. From replay simulation, use log replay to ensure every simulated scenario is rooted in a real-world scenario and has perfectly realistic sensor data. From synthetic simulation, make the simulation interactive by selectively replacing other road users with smart agents if they could interact with our vehicle.6 Modified architecture diagram merging parts of replay and synthetic simulation. Hybrid simulation usually serves as the default type of simulation that works well for most use cases. One convenient interpretation is that hybrid simulation is a worry-free replacement for replay simulation: anytime the developer would have used replay, they can absentmindedly switch to hybrid simulation to take care of the most common simulation artifacts while retaining most of the benefits of replay simulation. Conclusion We’ve seen that there are many types of simulation used in autonomous driving. They exist on a spectrum from purely replaying onroad scenarios to fully synthesized environments. The ideal simulation platform allows developers to pick an operating point on that spectrum that fits their use case. Hybrid simulation based on a large volume of real-world miles satisfies most testing needs at a reasonable cost, while fully synthetic modes serve niche use cases that can justify the higher development and operating costs. Cruise has written several deep dives about the usage and scaling of their simulation platform. However, neither Cruise nor Waymo provide many details on the construction of their simulator. ↩ I’ve even heard arguments that it’s only good for making videos. ↩ There exist architectures that are more end-to-end. However, to the best of my knowledge, those systems do not have driverless deployments with nontrivial mileage, making simulation testing less relevant. ↩ Another interactivity problem arises from the replay simulator’s inability to simulate different points of view as the simulated vehicle moves. A large pose divergence often causes the simulated vehicle to drive into an area not observed by the vehicle that produced the onroad log. For example, a simulated vehicle could decide to drive around a corner much earlier. But it wouldn’t be able to see anything until the log data also rounds the corner. No matter where the simulated vehicle drives, it will always be limited to what the logged vehicle saw. ↩ “Computer vision is inverse computer graphics.” ↩ As a nice bonus, because the irrelevant road users are replayed exactly as they drove in real life, this may reduce the compute cost of simulation. ↩

a year ago • 59 votes

Why autonomous trucking is harder than autonomous rideshare

Recently, The Verge asked, “where are all the robot trucks?” It’s a good question. Trucking was supposed to be the ideal first application of autonomous driving. Freeways contain predictable, highly structured driving scenarios. An autonomous truck would not have to deal with the complexities of intersections and two-way traffic. It could easily drive hundreds of miles without encountering a single pedestrian. DALL-E 3 prompt: “Generate an artistic, landscape aspect ratio watercolor painting of a truck with a bright red cab, pulling a white trailer. The truck drives uphill on an empty, rural highway during wintertime, lined with evergreen trees and a snow bank on a foggy, cloudy day.” The trucks could also be commercially viable with only freeway driving capability, or freeways plus a short segment of surface streets needed to reach a transfer hub. The AV company would only need to deal with a limited set of businesses as customers, bypassing the messiness of supporting a large pool of consumers inherent to the B2C model. Autonomous trucks would not be subject to rest requirements. As The Verge notes, “truck operators are allowed to drive a maximum of 11 hours a day and have to take a 30-minute rest after eight consecutive hours behind the wheel. Autonomous trucks would face no such restrictions,” enabling them to provide a service that would be literally unbeatable by a human driver. If you had asked me in 2018, when I first started working in the AV industry, I would’ve bet that driverless trucks would be the first vehicle type to achieve a million-mile driverless deployment. Aurora even pivoted their entire company to trucking in 2020, believing it to be easier than city driving. Yet sitting here in 2024, we know that both Waymo and Cruise have driven millions of miles on city streets — a large portion in the dense urban environment of San Francisco — and there are no driverless truck deployments. What happened? I think the problem is that driverless autonomous trucking is simply harder than driverless rideshare. The trucking problem appears easier at the outset, and indeed many AV developers quickly reach their initial milestones, giving them false confidence. But the difficulty ramps up sharply when the developer starts working on the last bit of polish. They encounter thorny problems related to the high speeds on freeways and trucks’ size, which must be solved before taking the human out of the driver’s seat. What is the driverless bar? Here’s a simplistic framework: No driver in the vehicle. No guarantee of a timely response from remote operators or backend services. Therefore, all safety-critical decisions must be made by the onboard computer alone. Under these constraints, the system still meets or exceeds human safety level. This is a really, really high bar. For example, on surface streets, this means the system on its own is capable of driving at least 100k miles without property damage and 40M miles without fatality.1 The system can still have flaws, but virtually all of those problems must result in a lack of progress, rather than collision or injury. In short, while the system may not know the right thing to do in every scenario, it should never do the wrong thing. (There are several high quality safety frameworks for those interested in a rigorous definition.23 It’s beyond the scope of this post.) Now, let’s look at each aspect of trucking to see how it exacerbates these challenges. Truck-specific challenges Stopping distance vs. sensing range The required sensor capability for an autonomous vehicle is determined by the most challenging scenario that the vehicle needs to handle. A major challenge in trucking is stopping behind a stalled vehicle or large debris in a travel lane. To avoid collision, the autonomous vehicle would need a sensing range greater than or equal to its stopping distance. We’ll make a simplifying assumption that stopping distance defines the minimum detection range requirements. A driverless-quality perception system needs perfect recall on other vehicles within the vehicle’s worst-case stopping distance. Passenger vehicles can decelerate up to –8 m/s². Trucks can only achieve around –4 m/s², which increases the stopping distance and puts the sensing range requirement right at the edge of what today’s sensors can deliver. Here are the sight stopping distances for an empty truck in dry conditions on roads of varying grade:4 Speed (mph) 0% Grade (m) –3% Grade (m) –6% Grade (m) 50 115–141 124–150 136–162 70 122–178 136–162 236–305 Sight stopping distances defined as the distance needed to stop assuming a 2.5-second reaction time with no braking, followed by maximum braking. The distance is computed for an empty truck in dry conditions on roads of varying grade. Stopping distance increases in wet weather or when driving downhill with a load (not shown). Now let’s compare these distances with the capabilities of various sensors: Lidar sensors provide trustworthy 3D data because they take direct measurements based on physical principles. They have a usable range of around 200–250 meters, plenty for city driving but not enough for every truck use case. Lidar detection models may also need to accumulate multiple scans/frames over time to detect faraway objects reliably, especially for smaller items like debris, further decreasing the usable detection range. Note that some solid-state lidars claim significantly more range than 250 meters. These numbers are collected under ideal conditions; for computing minimum sensing capability, we are interested in the range that can provide perfect recall and really great precision. For example, the lidar may be unable to reach its maximum range over the entire field of view, or may require undesirable trade-offs like a scan pattern that reduces point density and field of view to achieve more range. Radar can see farther than lidar. For example, this high-end ZF radar claims vehicle detections up to 350 meters away. Radar is great for tracking moving vehicles, but has trouble distinguishing between stationary vehicles and other background objects. Tesla Autopilot has infamously shown this problem by braking for overpasses and running into stalled vehicles. “Imaging” radars like the ZF device will do better than the radars on production vehicles. They still do not have the azimuth resolution to separate objects beyond 200 meters, where radar input is most needed. Cameras can detect faraway objects as long as there are enough pixels on the object, which leads to the selection of cameras with high resolution and a narrow field of view (telephoto lens). A vehicle will carry multiple narrow cameras for full coverage during turns. However, cameras cannot measure distance or speed directly. A combined camera + radar system using machine learning probably has the best chance here, especially with recent advances in ML-based early fusion, but would need to perform well enough to serve as the primary detection source beyond 200 meters. Training such a model is closer to an open problem than simply receiving that data from a lidar. In summary, we don’t appear to have any sensing solutions with the performance needed for trucks to meet the driverless bar. Controls Controlling a passenger vehicle — determining the amount of steering and throttle input to make the vehicle follow a trajectory — is a simpler problem than controlling a truck. For example, passenger vehicles are generally modeled as a single rigid body, while a truck and its trailer can move separately. The planner and controller need to account for this when making sharp turns and, in extreme low-friction conditions, to avoid jackknifing. These features come in addition to all the usual controls challenges that also apply to passenger vehicles. They can be built but require additional development and validation time. Freeway-specific challenges OK, so trucks are hard, but what about the freeway part? It may now sound appealing to build L4 freeway autonomy for passenger vehicles. However, driving on freeways also brings additional challenges on top of what is needed for city streets. Achieving the minimal risk condition on freeways Autonomous vehicles are supposed to stop when they detect an internal fault or driving situation that they can’t handle. This is called the minimal risk condition (MRC). For example, an autonomous passenger vehicle that detects an error in the HD map or a sensor failure might be programmed to execute a pullover or stop in lane depending on the problem severity. While MRC behaviors are annoying for other road users and embarrassing for the AV developer, they do not add undue risk on surface streets given the low speeds and already chaotic nature of city driving. This gives the AV developer more breathing room (within reason) to deploy a system that does not know how to handle every driving scenario perfectly, but knows enough to stay out of trouble. It’s a different story on the freeway. Stopping in lane becomes much more dangerous with the possibility of a rear-end collision at high speed. All stopping should be planned well in advance, ideally exiting at the next ramp, or at least driving to the closest shoulder with enough room to park. This greatly increases the scope of edge cases that need to be handled autonomously and at freeway speeds. For example: Scene understanding: If the vehicle encounters an unexpected construction zone, crash site, or other non-nominal driving scenario, it’s not enough to detect and stop. Rerouting, while a viable option on surface streets, usually isn’t an option on freeways because it may be difficult or illegal to make a u-turn by the time the vehicle can see the construction. A freeway under construction is also more likely to be the only path to the destination, especially if the autonomous vehicle in question is not designed to drive on city streets. Operational solutions are also not enough for a scaled deployment. AV developers often disallow their vehicles from routing through known problem areas gathered from manually driven scouting vehicles or announcements made by authorities. For a scaled deployment, however, it’s not reasonable to know the status of every mile of road at all times. Therefore, the system needs to find the right path through unstructured scenarios, possibly following instructions from police directing traffic, even if it involves traffic violations such as driving on the wrong side of the road. We know that current state-of-the-art autonomous vehicles still occasionally drive into wet concrete and trenches, which shows it is nontrivial to make a correct decision. Mapping: If the lane lines have been repainted, and the system normally uses an HD map, it needs to ignore the map and build a new one on-the-fly from the perception system’s output. It needs to distinguish between mapping and perception errors. Uptime: Sensor, computer, and software failures need to be virtually eliminated through redundancy and/or engineering elbow grease. The system needs almost perfect uptime. For example, it’s fine to enter a max-braking MRC when losing a sensor or restarting a software module on surface streets, provided those failures are rare. The same maneuver would be dangerous on the freeway, so the failure must be eliminated, or a fallback/redundancy developed. These problems are not impossible to overcome. Every autonomous passenger vehicle has solved them to some extent, with the remaining edge cases punted to some combination of MRC and remote operators. The difference is that, on freeways, they need to be solved with a very high level of reliability to meet the driverless bar. Freeways are boring The features that make freeways simpler — controlled access, no intersections, one-way traffic — also make “interesting” events more rare. This is a double-edged sword. While the simpler environment reduces the number of software features to be developed, it also increases the iteration time and cost. During development, “interesting” events are needed to train data-hungry ML models. For validation, each new software version to be qualified for driverless operation needs to encounter a minimum number of “interesting” events before comparisons to a human safety level can have statistical significance. Overall, iteration becomes more expensive when it takes more vehicle-hours to collect each event. AV developers can only respond by increasing the size of their operations teams or accepting more time between software releases. (Note that simulation is not a perfect solution either. The rarity of events increases vehicle-hours run in simulation, and so far, nobody has shown a substitute for real-world miles in the context of driverless software validation.) Is it ever going to happen? Trucking requires longer range sensing and more complex controls, increasing system complexity and pushing the problem to the bleeding edge of current sensing capabilities. At the same time, driving on freeways brings additional reliability requirements, raising the quality bar on every software component from mapping to scene understanding. If both the truck form factor and the freeway domain increase the level of difficulty, then driverless trucking might be the hardest application of autonomous driving: City Freeway Cars Baseline Harder Trucks Harder Hardest Now that scaled rideshare is mostly working in cities, I expect to see scaled freeway rideshare next. Does this mean driverless trucking will never happen? No, I still believe AV developers will overcome these challenges eventually. Aurora, Kodiak, and Gatik have all promised some form of driverless deployment by the end of the year. We probably won’t see anything close to a million-mile deployment in 2024 though. Getting there will require advances in sensing, machine learning, and a lot of hard work. Thanks to Steven W. and others for the discussions and feedback. This should be considered a bare minimum because humans perform much better on freeways, raising the bar for AVs. Rough numbers taken from Table 3, passenger vehicle national average on surface streets: Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2023). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. arXiv preprint arXiv:2312.13228. (blog) ↩ Kalra, N., & Paddock, S. M. (2016). Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94, 182-193. ↩ Favaro, F., Fraade-Blanar, L., Schnelle, S., Victor, T., Peña, M., Engstrom, J., … & Smith, D. (2023). Building a Credible Case for Safety: Waymo’s Approach for the Determination of Absence of Unreasonable Risk. arXiv preprint arXiv:2306.01917. (blog) ↩ Computed from tables 1 and 2: Harwood, D. W., Glauz, W. D., & Mason, J. M. (1989). Stopping sight distance design for large trucks. Transportation Research Record, 1208, 36-46. ↩

a year ago • 30 votes

How Cruise vehicles return to the garage autonomously in heavy rain

Cruise doesn’t carry passengers in heavy rain. The operational design domain (ODD) in their CPUC permit (PDF) only allows services in light rain. I’ve always wondered how they implement this operationally. For example, Waymo preemptively launches all cars with operators in the driver’s seat anytime there’s rain in the forecast. Cruise has no such policy: I have never seen them assign operators to customer-facing vehicles. Yet Cruise claims to run up to 100 driverless vehicles concurrently. It would be impractical to dispatch a human driver to each vehicle whenever it starts raining. When the latest atmostpheric river hit San Francisco, I knew it was my chance to find out how it worked. Monitoring the Cruise app As the rain intensified, as expected, all cars disappeared from Cruise’s app and the weather pause icon appeared. But then something unusual happened. The app returned to its normal state. A few cars showed up near a hole in the geofence — and they were actually hailable. Visiting the garage I drove over to find that this street is the entrance to one of Cruise’s garages. The same location has been featured in Cruise executives’ past tweets promoting the service.1 Despite the heavy rain and gusts strong enough to blow my hat/jacket off, a steady stream of Cruise vehicles were returning themselves to the garage in driverless mode. Driverless Cruise vehicles enter the garage during heavy rain. A member of Cruise’s operations team enters the vehicle to drive it into the garage. In total, I observed: 8 driverless vehicles 1 manually driven vehicle 1 support vehicle (unmodified Chevy Bolt not capable of autonomous driving) Two vehicles skip the garage After the first six driverless vehicles returned, the next two kept driving past the garage. I followed them in my own car. They drove for about 16 minutes, handling large puddles and road spray without noticeable comfort issues. Eventually they looped back to the garage and successfully entered. A Cruise vehicle drives through a puddle during its detour. I’m not totally sure what happened here. I can think of two reasonable explanations: Boring: The cars missed the turn for some unknown reason. Exciting: Cruise has implemented logic to avoid overwhelming the operations team’s ability to put cars back in the garage. If there are too many vehicles waiting to return, subsequent cars take a detour to kill time instead of blocking the driveway. Key take-aways Cruise is capable of handling heavy rain in driverless mode. The majority of Cruise vehicles returned to the garage autonomously. This enables them to handle correlated events, such as rain, without deploying a large operations team. Cruise may have implemented “take a lap around the block” logic to avoid congestion at the garage entrance. I can’t find the timelapse of Cruise launching their driverless cars anymore. I’m pretty sure it was posted to Twitter. Please let me know if you have the link! Update: Link to tweet by @kvogt. ↩

over a year ago • 40 votes

More in programming

Increase software sales by 50% or more

This is re-post of How to Permanently Increase Your Sales by 50% or More in Only One Day article by Steve Pavlina Of all the things you can do to increase your sales, one of the highest leverage activities is attempting to increase your products’ registration rate. Increasing your registration rate from 1.0% to 1.5% means that you simply convince one more downloader out of every 200 to make the decision to buy. Yet that same tiny increase will literally increase your sales by a full 50%. If you’re one of those developers who simply slapped the ubiquitous 30-day trial incentive on your shareware products without going any further than that, then I think a 50% increase in your registration rate is a very attainable goal you can achieve if you spend just one full day of concentrated effort on improving your product’s ability to sell. My hope is that this article will get you off to a good start and get you thinking more creatively. And even if you fail, your result might be that you achieve only a 25% or a 10% increase. How much additional money would that represent to you over the next five years of sales? What influence, if any, did the title of this article have on your decision to read it? If I had titled this article, “Registration Incentives,” would you have been more or less likely to read it now? Note that the title expresses a specific and clear benefit to you. It tells you exactly what you can expect to gain by reading it. Effective registration incentives work the same way. They offer clear, specific benefits to the user if a purchase is made. In order to improve your registration incentives, the first thing you need to do is to adopt some new beliefs that will change your perspective. I’m going to introduce you to what I call the “lies of success” in the shareware industry. These are statements that are not true at all, but if you accept them as true anyway, you’ll achieve far better results than if you don’t. Rule 1: What you are selling is merely the difference between the shareware and the registered versions, not the registered version itself. Note that this is not a true statement, but if you accept it as true, you’ll immediately begin to see the weaknesses in your registration incentives. If there are few additional benefits for buying the full version vs. using the shareware version, then you aren’t offering the user strong enough incentives to make the full purchase. Rule 2: The sole purpose of the shareware version is to close the sale. This is our second lie of success. Note the emphasis on the word “close.” Your shareware version needs to act as a direct sales vehicle. It must be able to take the user all the way to the point of purchase, i.e. your online order form, ideally with nothing more than a few mouse clicks. Anything that detracts from achieving a quick sale is likely to hurt sales. Rule 3: The customer’s perspective is the only one that matters. Defy this rule at your peril. Customers don’t care that you spent 2000 hours creating your product. Customers don’t care that you deserve the money for your hard work. Customers don’t care that you need to do certain things to prevent piracy. All that matters to them are their own personal wants and needs. Yes, these are lies of success. Some customers will care, but if you design your registration incentives assuming they only care about their own self-interests, your motivation to buy will be much stronger than if you merely appeal to their sense of honesty, loyalty, or honor. Assume your customers are all asking, “What’s in it for me if I choose to buy? What will I get? How will this help me?” I don’t care if you’re selling to Fortune 500 companies. At some point there will be an individual responsible for causing the purchase to happen, and that individual is going to consider how the purchase will affect him/her personally: “Will this purchase get me fired? Will it make me look good in front of my peers? Will this make my job easier or harder?” Many shareware developers get caught in the trap of discriminating between honest and dishonest users, believing that honest users will register and dishonest ones won’t. This line of thinking will ultimately get you nowhere, and it violates the third lie of success. When you make a purchase decision, how often do you use honesty as the deciding factor? Do you ever say, “I will buy this because I’m honest?” Or do you consider other more selfish factors first, such as how it will make you feel to purchase the software? The truth is that every user believes s/he is honest, so no user applies the honesty criterion when making a purchase decision. Thinking of your users in terms of honest ones vs. dishonest ones is a complete waste of time because that’s not how users primarily view themselves. Rule 4: Customers buy on emotion and justify with fact. If you’re honest with yourself, you’ll see that this is how you make most purchase decisions. Remember the last time you bought a computer. Is it fair to say that you first became emotionally attached to the idea of owning a new machine? For me, it’s the feeling of working faster, owning the latest technology, and being more productive that motivates me to go computer shopping. Once I’ve become emotionally committed, the justifications follow: “It’s been two years since I’ve upgraded, it will pay for itself with the productivity boost I gain, I can easily afford it, I’ve worked hard and I deserve a new machine, etc.” You use facts to justify the purchase. Once you understand how purchase decisions are made, you can see that your shareware products need to first get the user emotionally invested in the purchase, and then you give them all the facts they need to justify it. Now that we’ve gotten these four lies of success out of the way, let’s see how we might apply them to create some compelling registration incentives. Let’s start with Rule 1. What incentives can be spawned from this rule? The common 30-day trial is one obvious derivative. If you are only selling the difference between the shareware and registered versions, then a 30-day trial implies that you are selling unlimited future days of usage of the program after the trial period expires. This is a powerful incentive, and it’s been proven effective for products that users will continue to use month after month. 30-day trials are easy for users to understand, and they’re also easy to implement. You could also experiment with other time periods such as 10 days, 14 days, or 90 days. The only way of truly knowing which will work best for your products is to experiment. But let’s see if we can move a bit beyond the basic 30-day trial here by mixing in a little of Rule 3. How would the customer perceive a 30-day trial? In most cases 30 days is plenty of time to evaluate a product. But in what situations would a 30-day trial have a negative effect? A good example is when the user downloads, installs, and briefly checks out a product s/he may not have time to evaluate right away. By the time the user gets around to fully evaluating it, the shareware version has already expired, and a sale may be lost as a result. To get around this limitation, many shareware developers have started offering 30 days of actual program usage instead of 30 consecutive days. This allows the user plenty of time to try out the program at his/her convenience. Another possibility would be to limit the number of times the program can be run. The basic idea is that you are giving away limited usage and selling unlimited usage of the program. This incentive definitely works if your product is one that will be used frequently over a long period of time (much longer than the trial period). The flip side of usage limitation is to offer an additional bonus for buying within a certain period of time. For instance, in my game Dweep, I offer an extra 5 free bonus levels to everyone who buys within the first 10 days. In truth I give the bonus levels to everyone who buys, but the incentive is real from the customer’s point of view. Remember Rule 3 - it doesn’t matter what happens on my end; it only matters what the customer perceives. Any customer that buys after the first 10 days will be delighted anyway to receive a bonus they thought they missed. So if your product has no time-based incentives at all, this is the first place to start. When would you pay your bills if they were never due, and no interest was charged on late payments? Use time pressure to your advantage, either by disabling features in the shareware version after a certain time or by offering additional bonuses for buying sooner rather than later. If nothing else and if it’s legal in your area, offer a free entry in a random monthly drawing for a small prize, such as one of your other products, for anyone who buys within the first X days. Another logical derivative of Rule 1 is the concept of feature limitation. On the crippling side, you can start with the registered version and begin disabling functionality to create the shareware version. Disabling printing in a shareware text editor is a common strategy. So is corrupting your program’s output with a simple watermark. For instance, your shareware editor could print every page with your logo in the background. Years ago the Association of Shareware Professionals had a strict policy against crippling, but that policy was abandoned, and crippling has been recognized as an effective registration incentive. It is certainly possible to apply feature limitation without having it perceived as crippling. This is especially easy for games, which commonly offer a limited number of playable levels in the shareware version with many more levels available only in the registered version. In this situation you offer the user a seemingly complete experience of your product in the shareware version, and you provide additional features on top of that for the registered version. Time-based incentives and feature-based incentives are perhaps the two most common strategies used by shareware developers for enticing users to buy. Which will work best for you? You will probably see the best results if you use both at the same time. Imagine you’re the end user for a moment. Would you be more likely to buy if you were promised additional features and given a deadline to make the decision? I’ve seen several developers who were using only one of these two strategies increase their registration rates dramatically by applying the second strategy on top of the first. If you only use time-based limitations, how could you apply feature limitation as well? Giving the user more reasons to buy will translate to more sales per download. One you have both time-based and feature-based incentives to buy, the next step is to address the user’s perceived risk by applying a risk-reversal strategy. Fortunately, the shareware model already reduces the perceived risk of purchasing significantly, since the user is able to try before buying. But let’s go a little further, keeping Rule 3 in mind. What else might be a perceived risk to the user? What if the user reaches the end of the trial period and still isn’t certain the product will do what s/he needs? What if the additional features in the registered version don’t work as the user expects? What can we do to make the decision to purchase safer for the user? One approach is to offer a money-back guarantee. I’ve been offering a 60-day unconditional money-back guarantee on all my products since January 2000. If someone asks for their money back for any reason, I give them a full refund right away. So what is my return rate? Well, it’s about 8%. Just kidding! Would it surprise you to learn that my return rate at the time of this writing is less than 0.2%? Could you handle two returns out of every 1000 sales? My best estimate is that this one technique increased my sales by 5-10%, and it only took a few minutes to implement. When I suggest this strategy to other shareware developers, the usual reaction is fear. “But everyone would rip me off,” is a common response. I suggest trying it for yourself on an experimental basis; a few brave souls have already tried it and are now offering money-back guarantees prominently. Try putting it up on your web site for a while just to convince yourself it works. You can take it down at any time. After a few months, if you’re happy with the results, add the guarantee to your shareware products as well. I haven’t heard of one bad outcome yet from those who’ve tried it. If you use feature limitation in your shareware products, another important component of risk reversal is to show the user exactly what s/he will get in the full version. In Dweep I give away the first five levels in the demo version, and purchasing the full version gets you 147 more levels. When I thought about this from the customer’s perspective (Rule 3), I realized that a perceived risk is that s/he doesn’t know if the registered version levels will be as fun as the demo levels. So I released a new demo where you can see every level but only play the first five. This lets the customer see all the fun that awaits them. So if you have a feature-limited product, show the customer how the feature will work. For instance, if your shareware version has printing disabled, the customer could be worried that the full version’s print capability won’t work with his/her printer or that the output quality will be poor. A better strategy is to allow printing, but to watermark the output. This way the customer can still test and verify the feature, and it doesn’t take much imagination to realize what the output will look like without the watermark. Our next step is to consider Rule 2 and include the ability close the sale. It is imperative that you include an “instant gratification” button in your shareware products, so the customer can click to launch their default web browser and go directly to your online order form. If you already have a “buy now” button in your products, go a step further. A small group of us have been finding that the more liberally these buttons are used, the better. If you only have one or two of these buttons in your shareware program, you should increase the count by at least an order of magnitude. The current Dweep demo now has over 100 of these buttons scattered throughout the menus and dialogs. This makes it extremely easy for the customer to buy, since s/he never has to hunt around for the ordering link. What should you label these buttons? “Buy now” or “Register now” are popular, so feel free to use one of those. I took a slightly different approach by trying to think like a customer (Rule 3 again). As a customer the word “buy” has a slightly negative association for me. It makes me think of parting with my cash, and it brings up feelings of sacrifice and pressure. The words “buy now” imply that I have to give away something. So instead, I use the words, “Get now.” As a customer I feel much better about getting something than buying something, since “getting” brings up only positive associations. This is the psychology I use, but at present, I don’t know of any hard data showing which is better. Unless you have a strong preference, trust your intuition. Make it as easy as possible for the willing customer to buy. The more methods of payment you accept, the better your sales will be. Allow the customer to click a button to print an order form directly from your program and mail it with a check or money order. On your web order form, include a link to a printable text order form for those who are afraid to use their credit cards online. If you only accept two or three major credit cards, sign up with a registration service to handle orders for those you don’t accept. So far we’ve given the customer some good incentives to buy, minimized perceived risk, and made it easy to make the purchase. But we haven’t yet gotten the customer emotionally invested in making the purchase decision. That’s where Rule 4 comes in. First, we must recognize the difference between benefits and features. We need to sell the sizzle, not the steak. Features describe your product, while benefits describe what the user will get by using your product. For instance, a personal information manager (PIM) program may have features such as daily, weekly, and monthly views; task and event timers; and a contact database. However, the benefits of the program might be that it helps the user be more organized, earn more money, and enjoy more free time. For a game, the main benefit might be fun. For a nature screensaver, it could be relaxation, beauty appreciation, or peace. Features are logical; benefits are emotional. Logical features are an important part of the sale, but only after we’ve engaged the customer’s emotions. Many products do a fair job of getting the customer emotionally invested during the trial period. If you have an addictive program or one that’s fun to use, such as a game, you may have an easy time getting the customer emotionally attached to using it because the experience is already emotional in nature. But whatever your product is, you can increase your sales by clearly illustrating the benefits of making the purchase. A good place to do this is in your nag screens. I use nag screens both before and after the program runs to remind the user of the benefits of buying the full version. At the very least, include a nag screen when the customer exits the program, so the last thing s/he sees will be a reminder of the product’s benefits. Take this opportunity to sell the user on the product. Don’t expect features like “customizable colors” to motivate anyone to buy. Paint a picture of what benefits the user will obtain with the full version. Will I save time? Will I have more fun? Will I live longer, save money, or feel better? The simple change from feature-oriented selling to benefit-oriented selling can easily double or triple your sales. Be sure to use this approach on your web site as well if you don’t already. Developers who’ve recently made the switch have been reporting some amazing results. If you’re drawing a blank when trying to come up with benefits for your products, the best thing you can do is to email some of your old customers and ask them why they bought your program. What did it do for them? I’ve done this and was amazed at the answers I got back. People were buying my games for reasons I’d never anticipated, and that told me which benefits I needed to emphasize in my sales pitch. The next key is to make your offer irresistible to potential customers. Find ways to offer the customer so much value that it would be harder to say no than to say yes. Take a look at your shareware product as if you were a potential customer who’d never seen it before. Being totally honest with yourself, would you buy this program if someone else had written it? If not, don’t stop here. As a potential customer, what additional benefits or features would put you over the top and convince you to buy? More is always better than less. In the original version of Dweep, I offered ten levels in the demo and thirty in the registered version. Now I offer only five demo levels and 152 in the full version, plus a built-in level editor. Originally, I offered the player twice the value of the demo; now I’m offering over thirty times the value. I also offer free hints and solutions to every level; the benefit here is that it minimizes player frustration. As I keep adding bonuses for purchasing, the offer becomes harder and harder to resist. What clever bonuses can you throw in for registering? Take the time to watch an infomercial. Notice that there is always at least one “FREE” bonus thrown in. Consider offering a few extra filters for an image editor, ten extra images for a screensaver, or extra levels for a game. What else might appeal to your customers? Be creative. Your bonus doesn’t even have to be software-based. Offer a free report about building site traffic with your HTML editor, include an essay on effective time management with your scheduling program, or throw in a small business success guide with your billing program. If you make such programs, you shouldn’t have too much trouble coming up with a few pages of text that would benefit your customers. Keep working at it until your offer even looks irresistible to you. If all the bonuses you offer can be delivered electronically, how many can you afford to include? If each one only gains one more customer in a thousand (0.1%), would it be worth the effort over the lifetime of your sales? So how do you know if your registration incentives are strong enough? And how do you know if your product is over-crippled? Where do you draw the line? These are tough issues, but there is a good way to handle them if your product is likely to be used over a long period of time, particularly if it’s used on a daily basis. Simply make your program gradually increase its registration incentives over time. One easy way to do this is with a delay timer on your nag screens that increases each time the program is run. Another approach is to disable certain features at set intervals. You begin by disabling non-critical features and gradually move up to disabling key functionality. The program becomes harder and harder to continue using for free, so the benefits of registering become more and more compelling. Instead of having your program completely disable itself after your trial period, you gradually degrade its usability with additional usage. This approach can be superior to a strict 30-day trial, since it allows your program to still be used for a while, but after prolonged usage it becomes effectively unusable. However, you don’t simply shock the user by taking away all the benefits s/he has become accustomed to on a particular day. Instead, you begin with a gentle reminder that becomes harder and harder to ignore. There may be times when your 30-day trial shuts off at an inconvenient time for the user, and you may lose a sale as a result. For instance, the user may not have the money at the time, or s/he may be busy at the trial’s end and forget to register. In that case s/he may quickly replace what was lost with a competitor’s trial version. The gradual degradation approach allows the user to continue using your product, but with increasing difficulty over time. Eventually, there is a breaking point where the user either decides to buy or to stop using the program completely, but this can be done within a window of time at the user’s convenience. Hopefully this article has gotten you thinking creatively about all the overlooked ways you can entice people to buy your shareware products. The most important thing you can do is to begin seeing your products through your customers’ eyes. What additional motivation would convince you to buy? What would represent an irresistible offer to you? There is no limit to how many incentives you can add. Don’t stop at just one or two; instead, give the customer a half dozen or more reasons to buy, and you’ll see your registration rate soar. Is it worth spending a day to do this? I think so.

yesterday • 4 votes

Maybe writing speed actually is a bottleneck for programming

I'm a big (neo)vim buff. My config is over 1500 lines and I regularly write new scripts. I recently ported my neovim config to a new laptop. Before then, I was using VSCode to write, and when I switched back I immediately saw a big gain in productivity. People often pooh-pooh vim (and other assistive writing technologies) by saying that writing code isn't the bottleneck in software development. Reading, understanding, and thinking through code is! Now I don't know how true this actually is in practice, because empirical studies of time spent coding are all over the place. Most of them, like this study, track time spent in the editor but don't distinguish between time spent reading code and time spent writing code. The only one I found that separates them was this study. It finds that developers spend only 5% of their time editing. It also finds they spend 14% of their time moving or resizing editor windows, so I don't know how clean their data is. But I have a bigger problem with "writing is not the bottleneck": when I think of a bottleneck, I imagine that no amount of improvement will lead to productivity gains. Like if a program is bottlenecked on the network, it isn't going to get noticeably faster with 100x more ram or compute. But being able to type code 100x faster, even with without corresponding improvements to reading and imagining code, would be huge. We'll assume the average developer writes at 80 words per minute, at five characters a word, for 400 characters a minute.What could we do if we instead wrote at 8,000 words/40k characters a minute? Writing fast Boilerplate is trivial Why do people like type inference? Because writing all of the types manually is annoying. Why don't people like boilerplate? Because it's annoying to write every damn time. Programmers like features that help them write less! That's not a problem if you can write all of the boilerplate in 0.1 seconds. You still have the problem of reading boilerplate heavy code, but you can use the remaining 0.9 seconds to churn out an extension that parses the file and presents the boilerplate in a more legible fashion. We can write more tooling This is something I've noticed with LLMs: when I can churn out crappy code as a free action, I use that to write lots of tools that assist me in writing good code. Even if I'm bottlenecked on a large program, I can still quickly write a script that helps me with something. Most of these aren't things I would have written because they'd take too long to write! Again, not the best comparison, because LLMs also shortcut learning the relevant APIs, so also optimize the "understanding code" part. Then again, if I could type real fast I could more quickly whip up experiments on new apis to learn them faster. We can do practices that slow us down in the short-term Something like test-driven development significantly slows down how fast you write production code, because you have to spend a lot more time writing test code. Pair programming trades speed of writing code for speed of understanding code. A two-order-of-magnitude writing speedup makes both of them effectively free. Or, if you're not an eXtreme Programming fan, you can more easily follow the The Power of Ten Rules and blanket your code with contracts and assertions. We could do more speculative editing This is probably the biggest difference in how we'd work if we could write 100x faster: it'd be much easier to try changes to the code to see if they're good ideas in the first place. How often have I tried optimizing something, only to find out it didn't make a difference? How often have I done a refactoring only to end up with lower-quality code overall? Too often. Over time it makes me prefer to try things that I know will work, and only "speculatively edit" when I think it be a fast change. If I could code 100x faster it would absolutely lead to me trying more speculative edits. This is especially big because I believe that lots of speculative edits are high-risk, high-reward: given 50 things we could do to the code, 49 won't make a difference and one will be a major improvement. If I only have time to try five things, I have a 10% chance of hitting the jackpot. If I can try 500 things I will get that reward every single time. Processes are built off constraints There are just a few ideas I came up with; there are probably others. Most of them, I suspect, will share the same property in common: they change the process of writing code to leverage the speedup. I can totally believe that a large speedup would not remove a bottleneck in the processes we currently use to write code. But that's because those processes are developed work within our existing constraints. Remove a constraint and new processes become possible. The way I see it, if our current process produces 1 Utils of Software / day, a 100x writing speedup might lead to only 1.5 UoS/day. But there are other processes that produce only 0.5 UoS/d because they are bottlenecked on writing speed. A 100x speedup would lead to 10 UoS/day. The problem with all of this that 100x speedup isn't realistic, and it's not obvious whether a 2x improvement would lead to better processes. Then again, one of the first custom vim function scripts I wrote was an aid to writing unit tests in a particular codebase, and it lead to me writing a lot more tests. So maybe even a 2x speedup is going to be speed things up, too. Patreon Stuff I wrote a couple of TLA+ specs to show how to model fork-join algorithms. I'm planning on eventually writing them up for my blog/learntla but it'll be a while, so if you want to see them in the meantime I put them up on Patreon.

2 days ago • 6 votes

Occupation and Preoccupation

Here’s Jony Ive in his Stripe interview: What we make stands testament to who we are. What we make describes our values. It describes our preoccupations. It describes beautiful succinctly our preoccupation. I’d never really noticed the connection between these two words: occupation and preoccupation. What comes before occupation? Pre-occupation. What comes before what you do for a living? What you think about. What you’re preoccupied with. What you think about will drive you towards what you work on. So when you’re asking yourself, “What comes next? What should I work on?” Another way of asking that question is, “What occupies my thinking right now?” And if what you’re occupied with doesn’t align with what you’re preoccupied with, perhaps it's time for a change. Email · Mastodon · Bluesky

2 days ago • 3 votes

American hype

There's no country on earth that does hype better than America. It's one of the most appealing aspects about being here. People are genuinely excited about the future and never stop searching for better ways to work, live, entertain, and profit. There's a unique critical mass in the US accelerating and celebrating tomorrow. The contrast to Europe couldn't be greater. Most Europeans are allergic to anything that even smells like a commercial promise of a better tomorrow. "Hype" is universally used as a term to ridicule anyone who dares to be excited about something new, something different. Only a fool would believe that real progress is possible! This is cultural bedrock. The fault lines have been settling for generations. It'll take an earthquake to move them. You see this in AI, you saw it in the Internet. Europeans are just as smart, just as inventive as their American brethren, but they don't do hype, so they're rarely the ones able to sell the sizzle that public opinion requires to shift its vision for tomorrow. To say I have a complicated relationship with venture capital is putting it mildly. I've spent a career proving the counter narrative. Proving that you can build and bootstrap an incredible business without investor money, still leave a dent in the universe, while enjoying the spoils of capitalism. And yet... I must admit that the excesses of venture capital are integral to this uniquely American advantage on hype. The lavish overspending during the dot-com boom led directly to a spectacular bust, but it also built the foundation of the internet we all enjoy today. Pets.com and Webvan flamed out such that Amazon and Shopify could transform ecommerce out of the ashes. We're in the thick of peak hype on AI right now. Fantastical sums are chasing AGI along with every dumb derivative mirage along the way. The most outrageous claims are being put forth on the daily. It's easy to look at that spectacle with European eyes and roll them. Some of it is pretty cringe! But I think that would be a mistake. You don't have to throw away your critical reasoning to accept that in the face of unknown potential, optimism beats pessimism. We all have to believe in something, and you're much better off believing that things can get better than not. Americans fundamentally believe this. They believe the hype, so they make it come to fruition. Not every time, not all of them, but more of them, more of the time than any other country in the world. That really is exceptional.

2 days ago • 3 votes

File sync is very slow

I’m working on a Go library appendstore for append-only store of lots of things in a single file. To make things as robust as possible I was calling os.File.Sync() after each append. Sync() is waiting until the data is acknowledged as truly, really written to disk (as opposed to maybe floating somewhere in disk drive’s write buffer). Oh boy, is it slow. A test of appending 1000 records would take over 5 seconds. After removing the Sync() it would drop to 5 milliseconds. 1000x faster. I made sync optional - it’s now up to the user of the library to pick it, defaults to non-sync. Is it unsafe now? Well, the reality is that it probably doesn’t matter. I don’t think lots of software does the sync due to slowness and the world still runs.

2 days ago • 2 votes

New here?