Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
22
Cruise doesn’t carry passengers in heavy rain. The operational design domain (ODD) in their CPUC permit (PDF) only allows services in light rain. I’ve always wondered how they implement this operationally. For example, Waymo preemptively launches all cars with operators in the driver’s seat anytime there’s rain in the forecast. Cruise has no such policy: I have never seen them assign operators to customer-facing vehicles. Yet Cruise claims to run up to 100 driverless vehicles concurrently. It would be impractical to dispatch a human driver to each vehicle whenever it starts raining. When the latest atmostpheric river hit San Francisco, I knew it was my chance to find out how it worked. Monitoring the Cruise app As the rain intensified, as expected, all cars disappeared from Cruise’s app and the weather pause icon appeared. But then something unusual happened. The app returned to its normal state. A few cars showed up near a hole in the geofence — and they were actually...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Kevin Chen

Real estate is one of the hardest open problems in scaled self driving

I’ve had a minor obsession with Waymo’s autonomous vehicle depots recently. Over the past few months, I’ve flown a drone as part of a stakeout to understand how they work. And I’ve taken a deep dive into an apparent Waymo outage to find the company charging its electric vehicles from temporary diesel generators. The reason for my obsession? I believe depot buildouts will be one of the last hard problems in scaled autonomous driving. Long after the hardware, software, and AI have been perfected, real estate acquisition will remain a limiting factor in large-scale AV deployment. Waymo’s main depot at 201 Toland Street, San Francisco. Will self driving follow software scaling laws? In 2021, Elon Musk claimed that Tesla FSD’s release will be “one of the biggest asset value increases in history.” The day FSD goes to wide release will be one of the biggest asset value increases in history — Elon Musk (@elonmusk) October 20, 2021 Musk is arguing that, once autonomous driving has been solved, it can be instantly rolled out at the push of a button. Nearly all of Tesla’s fleet could be put to productive use without humans behind the wheel. While Musk’s viewpoint is on the extreme end, it’s a sentiment shared by many who have worked on or invested in autonomous driving over the years. Once you have hardware capable of supporting safe driverless operation, it’s just a matter of developing the right software. Software can be replicated infinitely at zero marginal cost. Could autonomous driving therefore scale as quickly as software platforms like Uber or DoorDash? The answer is not so simple. Self-driving cars are still cars — cars that exist in the physical world and need to be parked, fueled, cleaned, and repaired. Uber and other multi-sided marketplace platforms have been able to grow exponentially because they distribute these responsibilities to the individual drivers — many Uber drivers park at their own homes — allowing the platform provider to focus on developing the software pieces. So far, AV companies like Waymo and Cruise have taken a different approach. They’ve preferred to centralize these operational tasks in large depots staffed with their own personnel. This is because AV technology is still maturing and cannot be easily productized in the short term. Additionally, Timothy B. Lee notes in Understanding AI that “having hardware, software, and support services all under one roof makes it easier for Waymo to experiment with different technologies and business models.” When the kinks are still being worked out, it is more straightforward to vertically integrate everything in a single organization. The many jobs to be done of a robotaxi depot Depots for human-driven fleets, such as rental cars or delivery vans, only require a parking area with minimal additional infrastructure. This enables a fairly straightforward trade-off between location and cost: the fleet operator seeks a location close to customer demand while minimizing rent. For example, a logistics company participating in Amazon’s Delivery Service Partner program can run its depot from any sufficiently cheap parking lot near the local Amazon warehouse. The same constraints affect depot selection for autonomous vehicles. However, the depot also needs to be more than just a convenient parking lot to store off-duty cars. Because AVs are often also EVs, the ideal site also has electric vehicle charging. Because AVs need to upload driving logs to the cloud, it should have a high-speed Internet connection too. Let’s explore these constraints in detail. Location Depots should be placed close to customer demand to minimize deadheading (non-revenue driving), which would raise costs while degrading the customer experience with longer pickup times. Ideal depots are therefore located in desirable residential or commercial areas, where there is more competition among potential tenants. Placing depots in high-demand neighborhoods instead of industrial areas can also increase the probability of local opposition. Waymo has already encountered opposition during a proposed expansion of their main depot, even though it is located in an industrial neighborhood with many similar facilities. Again from Timothy B. Lee: Waymo sought a permit to convert the warehouse next door into some office space and a parking lot for Waymo employees. San Francisco’s Board of Supervisors unanimously rejected Waymo’s application. The rejection was partly based on fears that Waymo would eventually use the space to launch a delivery service in the city (Waymo hasn’t announced any plans to do this so far). But it also reflected city leaders’ frustration with their general lack of power over Waymo. Now consider the recent incident in which driverless Waymo vehicles honked at each other while entering a depot near residential buildings in San Francisco, often well into the early hours of the morning. While residents and the company resolved the situation amicably, it will surely be raised in future discussions of new Waymo depots in residential areas should they come before the Board of Supervisors or Planning Commission. Electric vehicle charging AV developers have preferred to run their services with electric vehicles. Although AV and EV technologies are not inherently coupled, running a fully electric fleet adds an environmental angle to the AV sales pitch, allowing the companies to claim that AV rides reduce emissions by displacing gas-powered driving. An EV fleet also lowers vehicle maintenance costs. Waymo and Cruise each have locations with DC fast charging capability. This approach avoids relying on public chargers. Taking Waymo’s primary San Francisco depot as an example, the company installed 38 chargers of approximately 60 kW each, implying a total site power of around 2.4 MW. Waymo vehicles charging in San Francisco. Approximately one-third of parking spots in the main depot have charging. Bringing in so many high-power chargers likely added significant complexity to Waymo’s depot construction. While we don’t Waymo’s process, we have a fairly good benchmark from the Tesla community, which tracks Tesla Supercharger installations closely. From Bruce Mah, a seasoned EV charging observer, the construction process is: As with any construction project, things usually start with selecting a site and permitting. There will often be some demolition / excavation of part of a parking lot (Superchargers are often built in existing parking lots). Tesla equipment such as charging cabinets, posts, etc. will usually be installed next (see T1 below). Eventually there will be some inspections from the local Authority Having Jurisdiction (AHJ). A utility transformer (from PG&E, SCE, etc.) is usually the last piece of equipment to be installed. Repaving, painting, and installation of parking stops will also usually happen late in the process, as well as landscaping and lighting enhancements. Of these steps, permitting and utility work are not within the charging operator’s control. California municipalities, especially San Francisco, have a notoriously slow and political permitting process. With PG&E, the utility serving much of the state, electrical service upgrades involving a new distribution transformer can take months. Timelines aside, building out a charging site is also expensive. For example, an agreement between Tesla and the City of West Hollywood values an eight-plug location at $482,942 for both equipment and construction. Data offload The final piece of the puzzle is data offload. Autonomous vehicles log vast amounts of data as they drive, measured in hundreds of GBs to TBs per hour. Some of the data is subject to mandatory retention and must be uploaded for later review. At a minimum, all AV collisions in California must be reported to the DMV. Regulators at all levels of government expect the AV developer to present analyses of serious incidents, including recordings from the vehicle and explanations of the AV’s decisions. In addition to regulatory requirements, the AV developer often wants to return much more data for engineering purposes: near misses, stuck events, novel or interesting scenarios, and more. The upshot is that the AV operator needs to upload a substantial portion of the hundreds of GBs to TBs logged per hour of driving. Uploading over cellular networks would not be cost effective. These transfers must occur at a depot. Today, it’s likely that Waymo and Cruise use disk swapping for data offload. When a car fills up its internal logging disk, it notifies an operator to plug in a fresh one. The full disks may be uploaded directly from the depot or shipped to a datacenter. This whole process is labor intensive and, over time, may pose a reliability concern due to dust or water ingress. A Waymo operator performs a possible disk swap. Many AV developers are moving toward direct data transfer from the vehicle using Ethernet, Wi-Fi, or a private 5G network, which reduces the number of manual touch points and moving parts. Charging is a great time to perform these transfers. However, this imposes an additional requirement on the depot: a fast upload speed, probably a fiber connection of at least 10 Gbps. Where do we go from here? When we put all three requirements together (great location, high-power EV charging, and high-speed Internet), there may be few to none sites that fit the bill. This would require the AV operator to take on site-specific construction projects to add amenities like charging and Internet — a strategy that sits in direct opposition to rapid and cost-effective scaling. Another possibility is to engineer ways to relax the constraints. Decoupling the requirements Waymo and Cruise do not require all of their locations to have charging and data offload. For example, Waymo operates satellite lots in downtown San Francisco only for storing their off-hail vehicles. Every night, fleet management software instructs the cars to travel back to the main depot for charging and data offload. A Waymo satellite location in San Francisco with minimal staffing, no charging, and apparently no data offloading. This solution works as long as the total charging and data transfer capacity across all locations exceeds the average throughput required to keep the fleet in working order. However, the lack of redundancy can lead to cascading failures, such as the apparent power outage at Waymo’s main depot that led the company to shut down many vehicles during a Friday evening rush hour. Reducing charging power Waymo and Cruise currently use DC fast charging (DCFC) for their fleets. Level 2 (L2) or AC charging could reduce the cost of buildouts because the equipment is cheaper and can often be added without bringing a new utility transformer. This could enable overnight charging in satellite locations that do not currently have any charging capacity. Imagine an operator showing up to plug in all the cars at night when there is little demand, then returning in the morning to unplug them. There is an order of magnitude speed difference between L2 and DCFC. This is important for consumer charging, where the consumer cares about the time to get a single car back on the road. However, charging power for any individual car becomes less important when charging a large fleet. Fleet operators care about the total throughput of turning around cars, which is proportional to total power delivered across all chargers. In other words, assuming an autonomous ride hailing service will always have overnight lulls in demand and enough parking spots during those times, the most scalable strategy is to procure your desired total charging power at the lowest price. DCFC equipment costs disproportionately more per kW due to the additional complexity of the charging equipment — and that doesn’t include the additional maintenance complexity. The table below compares ChargePoint’s cheapest L2 and DCFC units: Charger Power (kW) Price ($) Unit Price ($/kW) ChargePoint CPF50 9.6 kW $1,299 $135/kW ChargePoint CPE250 62.5 kW $52,000 $832/kW In addition to more scalable depot buildouts, reduced charging power can also increase the longevity of the vehicle’s traction battery, which is an important factor in managing vehicle depreciation. Reducing data logging rate Most AV developers start out by logging and uploading all data generated on their vehicles. This makes development easy because the data is always there when you need it. These assumptions need to be broken when a growing fleet generates proportionally more logs, most of which contain routine driving and are not very interesting. We can split the data logged by AVs into two categories: Raw sensor data, such as lidar point clouds, camera images, radar returns, and audio. Derived data, such as detections from the perception system or motion plans from the behavior system. One approach is to keep only one category of data. Retaining only the derived data can still enable debugging of serious incidents, as long as the perception system can be trusted to provide a faithful representation of the raw sensor data. On the other hand, retaining only the raw sensor data makes the logs more useful for developing the mapping and perception system. Similar-looking derived data can be generated by running a replay simulator as needed, but it is challenging to reproduce the exact same outputs as those on the vehicle unless the AV software is fully deterministic. Data retention decisions can also be made temporally. The key challenge here is high-recall classification of which time ranges in the log must be retained. For example, if a DMV-reportable collision occurs, the associated log data must never be discarded. These decisions can happen either on-device or in the cloud, but they must be made without uploading the full log to the cloud, since our bottleneck is the connection from the vehicle to the Internet. Conclusion The current trajectory for scaled autonomous driving would require desirable depot locations to include charging and Internet, making real estate acquisition challenging. There exist opportunities to reduce the additional requirements over time with the goal of making the problem closer to “rent a bunch of conveniently located parking lots.” While these are not traditionally considered autonomous driving problems, solving them will be key to unlocking the next phase of scaling.

6 months ago 56 votes
Large language models are a sustaining innovation for Siri

Many people assume that large language models (LLMs) will disrupt existing consumer voice assistants. Compared to Siri, while today’s ChatGPT is largely unable to complete real-world tasks like hailing an Uber, it’s far better than Siri at understanding and generating language, especially in response to novel requests. From Tom’s Hardware, this captures the sentiment I see among tech commentators: GPT-4o will enable ChatGPT to become a legitimate Siri competitor, with real-time conversations via voice that are responded to instantly without lag time. […] ChatGPT’s new real-time responses make tools like Siri and Echo seem lethargic. And although ChatGPT likely won’t be able to schedule your haircuts like Google Assistant can, it did put up admirable real-time translating chops to challenge Google. Last year, there were rumors that OpenAI was working on its own hardware, which would open the possibility of integrating ChatGPT at the system level along the lines of the Humane Ai Pin. Would such a product be able to mount a successful challenge against Siri, Alexa, and Google Assistant? After Apple’s WWDC keynote yesterday and seeing the updated Siri APIs, I think it’s more likely that LLMs are a sustaining innovation for Siri — a technological innovation that strengthens the position of incumbent voice assistants. Apple promised to increase the number of ways in which Siri can take action in apps. Source: Apple. Traditional voice assistants work by matching the user’s queries to a fixed set of intents. Pierce Freeman explains the general approach: The previous generation of personal assistants had control logic that was largely hard-coded. They revolved around the idea of an intent — a known task that a user wanted to do like sending a message, searching for weather, etc. Detecting this intent might be keyword based or trained into a model that converts a sequence to a one-hot class space. But generally speaking there were discrete tasks and the job of the NLU pipeline was to delegate it to sub-modules. Once the query has been matched to an intent, the next step is to “fill in the blanks” for any inputs needed by the intent: If it believes you’re looking for weather, a sub-module would attempt to detect what city you’re asking about. This motivated a lot of the research into NER (named entity recognition) to detect the more specific objects of interest and map them to real world quantities. city:San Francisco and city:SF to id:4467 for instance. Conversational history was implemented by keeping track of what the user had wanted in previous steps. If a new message is missing some intent, it would assume that a previous message in the flow had a relevant intent. This process of back-detecting the relevant intent was mostly hard-coded or involved a shallow model. A natural outcome is that Apple is forced to develop an expansive and complicated system of intents because it is the only way to expand the assistant’s capabilities. In 2016, Apple also allowed third-party developers to integrate their apps’ functionality by providing intents through SiriKit. Once the developer defines the inputs and outputs, the intents could appear in Siri, the Shortcuts app, proactive notifications, etc. alongside first-party intents by Apple. Similar frameworks exist on other platforms: Android App Actions and Alexa Skills. However, no matter how rich the intent library becomes, the overall user experience can still suffer if access is gated by a brittle intent-matching process: either (1) matching to the incorrect intent, or (2) after matching the correct intent, parsing the request parameters incorrectly. In my opinion, the intent matching stage is the primary source of users’ frustration with Siri. Incorrect named entity recognition by Siri. Contrast this with ChatGPT plugins, a similar system that allows the model to interact with external APIs by determining which plugin might be relevant to the user’s request, then reading the plugin’s API specification to determine the input and output parameters. In other words, the intent matching is performed by an LLM. The generalist nature of LLMs seems to reduce brittleness. For example, when using the code interpreter plugin, the model can write arbitrary Python code and fix resulting runtime errors. The main issue for challengers (OpenAI, Humane, and Rabbit) is the lack of third-party integrations to make their assistants helpful in consumers’ digital lives, extending beyond general knowledge tasks. For example: The Humane Ai Pin only streams music from Tidal, not Spotify nor Apple Music. The Rabbit R1 “large action model” is, in reality, just a few handwritten UI automation scripts for the applications shown in their demo. The system does not appear to generalize to unseen applications. In general, while companies working on UI agents have shown some limited demos, I’m not aware of any that run with high reliability and scale. Even if they achieve scale and generalization, their agents could be made less reliable by app developers using anti-scraping techniques because the developers prefer to own the customer relationship, or as leverage for future partnership negotiations. This type of system is probably a few years out at a minimum. Without a large user base, developers have no incentive to port their apps, leaving the integration work to the platform owner, as in the cases of Humane and Rabbit. Meanwhile, Apple, Amazon, and Google each have a pre-existing app ecosystem. Their position as aggregators means developers are highly motivated to access the enormous installed base of iOS, Alexa, and Android devices. Assuming LLM technology will become a commodity, the incumbents’ in-house LLMs ought to be able to provide decent intent matching and language skills. Combined with an expansive library of intents, it seems very possible that integrating LLMs will cement the incumbent voice assistants as the dominant platforms. Challengers might be better off building products in areas that don’t depend on third-party integrations for key functionality.

9 months ago 54 votes
How autonomous vehicle simulation works

When autonomous vehicle developers justify the safety of their driverless vehicle deployments, they lean heavily on their testing in simulation. Common talking points take the form of “we made our car drive X billion miles in simulation.” From these vague statements, it’s challenging to determine what a simulator is, or how it works. There’s more to simulation than endless driving in a virtual environment. For example, Waymo’s technology overview page says (emphasis mine): We’ve driven more than 20 billion miles in simulation to help identify the most challenging situations our vehicles will encounter on public roads. We can either replay and tweak real-world miles or build completely new virtual scenarios, for our autonomous driving software to practice again and again. Cruise’s safety page contains similar language:1 Before setting out on public roads, Cruise vehicles complete more than 250,000 simulations and closed course testing during everyday and extreme conditions. The main impression one gets from these overviews is that (1) simulation can test many driving scenarios, and (2) everyone will be super impressed if you use it a lot. Going one layer deeper to the few blog posts and talks full of slick GIFs, you might reach the conclusion that simulation is like a video game for the autonomous vehicle in the vein of Grand Theft Auto (GTA): a fully generated 3D environment complete with textures, lighting, and non-player characters (NPCs). Much like human players of GTA, the autonomous vehicle would be able to drive however it likes, freed from real-world consequences. Source: Cruise. While this type of fully synthetic simulation exists in the world of autonomous driving, it’s actually the least commonly used type of simulation.2 Instead, just as a software developer leans on many kinds of testing before releasing an application, an AV developer runs many types of simulation before deploying an autonomous vehicle. Each type of simulation is best suited for a particular use case, with trade-offs between realism, coverage, technical complexity, and cost to operate. In this post, we’ll walk through the system design of a simulator at a hypothetical AV company, starting from first principles. We may never know the details of the actual simulator architecture used by any particular AV developer. However, by exploring the design trade-offs from first principles, I hope to shed some light on how this key system works. Contents Our imaginary self-driving car Replay simulation Interactivity and the pose divergence problem Synthetic simulation The high cost of realistic imagery Round-trip conversions to pixels and back Skipping the sensor data Making smart agents Generating scene descriptions Limitations of pure synthetic simulation Hybrid simulation Conclusion Our imaginary self-driving car Let’s begin by defining our hypothetical autonomous driving software, which will help us illustrate how simulation fits into the development process. Imagine it’s 2015, the peak of self-driving hype, and our team has raised a vast sum of money to develop an autonomous vehicle. Like a human driver, our software drives by continuously performing a few basic tasks: It makes observations about the road and other road users. It reasons about what others might do and plans how it should drive. Finally, it executes those planned motions by steering, accelerating, and braking. Rinse and repeat. This mental model helps us group related code into modules, enabling them to be developed and tested independently. There will be four modules in our system:3 Sensor Interface: Take in raw sensor data such as camera images and lidar point clouds. Sensing: Detect objects such as vehicles, pedestrians, lane lines, and curbs. Behavior: Determine the best trajectory (path) for the vehicle to drive. Vehicle Interface: Convert the trajectory into steering, accelerator, and brake commands to control the vehicle’s drive-by-wire (DBW) system. We connect our modules to each other using an inter-process communication framework (“middleware”) such as ROS, which provides a publish–subscribe system (pubsub) for our modules to talk to each other. Here’s a concrete example of our module-based encapsulation system in action: The sensing module publishes a message containing the positions of other road users. The behavior module subscribes to this message when it wants to know whether there are pedestrians nearby. The behavior module doesn’t know and doesn’t care how the perception module detected those pedestrians; it just needs to see a message that conforms to the agreed-upon API schema. Defining a schema for each message also allows us to store a copy of everything sent through the pubsub system. These driving logs will come in handy for debugging because it allows us to inspect the system with module-level granularity. Our full system looks like this: Simplified architecture diagram for an autonomous vehicle. Now it’s time to take our autonomous vehicle for a spin. We drive around our neighborhood, encountering some scenarios in which our vehicle drives incorrectly, which cause our in-car safety driver to take over driving from the autonomous vehicle. Each disengagement gets reviewed by our engineering team. They analyze the vehicle’s logs and propose some software changes. Now we need a way to prove our changes have actually improved performance. We need the ability to compare the effectiveness of multiple proposed fixes. We need to do this quickly so our engineers can receive timely feedback. We need a simulator! Replay simulation Motivated by the desire to make progress quickly, we try the simplest solution first. The key insight: our software modules don’t care where the incoming messages come from. Could we simulate a past scenario by simply replaying messages from our log as if they were being sent in real time? As the name suggests, this is exactly how replay simulation works. Under normal operation, the input to our software is sensor data captured from real sensors. The simulator replaces this by replaying sensor data from an existing log. Under normal operation, the output of our software is a trajectory (or a set of accelerator and steering commands) that the real car executes. The simulator intercepts the output to control the simulated vehicle’s position instead. Modified architecture diagram for running replay simulation. There are two primary ways we can use this type of simulator, depending on whether we use a different software version as the onroad drive: Different software: By running modified versions of our modules in the simulator, we can get a rough idea of how the changes will affect the vehicle’s behavior. This can provide early feedback on whether a change improves the vehicle’s behavior or successfully fixes a bug. Same software: After a disengagement, we may want to know what would have happened if the autonomous vehicle were allowed to continue driving without human input. Simulation can provide this counterfactual by continuing to play back messages as if the disengagement never happened. We’ve gained these important testing capabilities with relatively little effort. Rather than take on the complexity of a fully generated 3D environment, we got away with a few modifications to our pubsub framework. Interactivity and the pose divergence problem The simplicity of a pure replay simulator also leads to its key weakness: a complete lack of interactivity. Everything in the simulated environment was loaded verbatim from a log. Therefore, the environment does not respond to the simulated vehicle’s behavior, which can lead to unrealistic interactions with other road users. This classic example demonstrates what can happen when the simulated vehicle’s behavior changes too much: Watch on YouTube. Dragomir Anguelov’s guest lecture at MIT. Source: Lex Fridman. Our vehicle, when it drove in the real world, was where the green vehicle is. Now, in simulation, we drove differently and we have the blue vehicle. So we’re driving…bam. What happened? Well, there is a purple agent over there — a pesky purple agent — who, in the real world, saw that we passed them safely. And so it was safe for them to go, but it’s no longer safe, because we changed what we did. So the insight is: in simulation, our actions affect the environment and needed to be accounted for. Anguelov’s video shows the simulated vehicle driving slower than the real vehicle. This kind of problem is called pose divergence, a term that covers any simulation where differences in the simulated vehicle’s driving decisions cause its position to differ from the real-world vehicle’s position. In the video, the pose divergence leads to an unrealistic collision in simulation. A reasonable driver in the purple vehicle’s position would have observed the autonomous vehicle and waited for it to pass before entering the intersection.4 However, in replay simulation, all we can do is play back the other driver’s actions verbatim. In general, problems arising from the lack of interactivity mean the simulated scenario no longer provides useful feedback to the AV developer. This is a pretty serious limitation! The whole point of the simulator is to allow the simulated vehicle to make different driving decisions. If we cannot trust the realism of our simulations anytime there is an interaction with another road user, it rules out a lot of valuable use cases. Synthetic simulation We can solve these interactivity problems by using a simulated environment to generate synthetic inputs that respond to our vehicle’s actions. Creating a synthetic simulation usually starts with a high-level scene description containing: Agents: fully interactive NPCs that react to our vehicle’s behavior. Environments: 3D models of roads, signs, buildings, weather, etc. that can be rendered from any viewpoint. From the scene description, we can generate different types of synthetic inputs for our vehicle to be injected at different layers of its software stack, depending on which modules we want to test. In synthetic sensor simulation, the simulator uses a game engine to render the scene description into fake sensor data, such as camera images, lidar point clouds, and radar returns. The simulator sets up our software modules to receive the generated imagery instead of sensor data logged from real-world driving. Modified architecture diagram for running synthetic simulation with generated sensors. The same game engine can render the scene from any arbitrary perspective, including third-person views. This is how they make all those slick highlight reels. The high cost of realistic imagery Simulations that generate fake sensor data can be quite expensive, both to develop and to run. The developer needs to create a high-quality 3D environment with realistic object models and lighting rivaling AAA games. Example of Cruise’s synthetic simulation showing the same scene rendered into synthetic camera, lidar, and radar data. Source: Cruise. For example, a Cruise blog post mentions some elements of their synthetic simulation roadmap (emphasis mine): With limited time and resources, we have to make choices. For example, we ask how accurately we should model tires, and whether or not it is more important than other factors we have in our queue, like modeling LiDAR reflections off of car windshields and rearview mirrors or correctly modeling radar multipath returns. Even if rendering reflections and translucent surfaces is already well understood in computer graphics, Cruise may still need to make sure their renderer generates realistic reflections that resemble their lidar. This challenge gives a sense of the attention to detail required. It’s only one of many that needs to be solved when building a synthetic sensor simulator. So far, we have only covered the high development costs. Synthetic sensor simulation also incurs high variable costs every time simulation is run. Round-trip conversions to pixels and back By its nature, synthetic sensor simulation performs a round-trip conversion to and from synthetic imagery to test the perception system. The game engine first renders its scene description to synthetic imagery for each sensor on the simulated vehicle, burning many precious GPU-hours in the process, only to have the perception system perform the inverse operation when it detects the objects in the scene to produce the autonomous vehicle’s internal scene representation.5 Every time you launch a synthetic sensor simulation, NVIDIA, Intel, and/or AWS are laughing all the way to the bank. Despite the expense of testing the perception system with synthetic simulation, it is also arguably less effective than testing with real-world imagery paired with ground truth labels. With real imagery, there can be no question about its realism. Synthetic imagery never looks quite right. These practical limitations mean that synthetic sensor simulation ends up as the least used simulator type in AV companies. Usually, it’s also the last type of simulator to be built at a new company. Developers don’t need synthetic imagery most of the time, especially when they have at their disposal a fleet of vehicles that can record the real thing. On the other hand, we cannot easily test risky driving behavior in the real world. For example, it is better to synthesize a bunch of red light runners than try to find them in the real world. This means we are primarily using synthetic simulation to test the behavior system. Skipping the sensor data In synthetic agent simulation, the simulator uses a high-level scene description to generate synthetic outputs from the perception/sensing system. In software development terms, it’s like replacing the perception system with a mock to focus on testing downstream components. This type of simulation requires fewer computational resources to run because the scene description doesn’t need to make a round-trip conversion to sensor data. Modified architecture diagram for running synthetic simulation with generated agents. With image quality out of the picture, the value of synthetic simulation rests solely on the quality of the scenarios it can create. We can split this into two main challenges: designing agents with realistic behaviors generating the scene descriptions containing various agents, street layouts, and environmental conditions Making smart agents You could start developing the control policy for a smart agent similar to NPC design in early video games. A basic smart agent could simply follow a line or a path without reacting to anyone else, which could be used to test the autonomous vehicle’s reaction to a right of way violation. A fancier smart agent could follow a path while also maintaining a safe following distance from the vehicle in front. This type of agent could be placed behind our simulated vehicle, resolving the rear-ending problem mentioned above. Like an audience of demanding gamers, the users of our simulator quickly expect increasingly complex and intelligent behaviors from the smart agents. An ideal smart agent system would capture the full spectrum of every action that other road users could possibly take. This system would also generate realistic behaviors, including realistic-looking trajectories and reaction times, so that we can trust the outcomes of simulations involving smart agents. Finally, our smart agents need to be controllable: they can be given destinations or intents, enabling developers to design simulations that test specific scenarios. Watch on YouTube. Two Cruise simulations in which smart agents (orange boxes) interact with the autonomous vehicle. In the second simulation, two parked cars have been inserted into the bottom of the visualization. Notice how the smart agents and the autonomous vehicle drive differently in the two simulations as they interact with each other and the additional parked cars. Source: Cruise. Developing a great smart agent policy ends up falling in the same difficulty ballpark as developing a great autonomous driving policy. The two systems may even share technical foundations. For example, they may have a shared component that is trained to predict the behaviors of other road users, which can be used for both planning our vehicle’s actions and for generating realistic agents in simulation. Generating scene descriptions Even with the ability to generate realistic synthetic imagery and realistic smart agent behaviors, our synthetic simulation is not complete. We still need a broad and diverse dataset of scene descriptions that can thoroughly test our vehicle. These scene descriptions usually come from a mix of sources: Automatic conversion from onroad scenarios: We can write a program that takes a logged real-world drive, guesses the intent of other road users, and stores those intents as a synthetic simulation scenario. Manual design: Analogous to a level editor in a video game. A human either builds the whole scenario from scratch or makes manual edits to an automatic conversion. For example, a human can design a scenario based on a police report of a human-on-human-driver collision to simulate what the vehicle might have done in that scenario. Generative AI: Recent work from Zoox uses diffusion models trained on a large dataset of onroad scenarios. Example of a real-world log (top) converted to a synthetic simulation scenario, then rendered into synthetic camera images (bottom). Notice how some elements, such as the protest signs, are not carried over, perhaps because they are not supported by the perception system or the scene converter. Source: Cruise. Scenarios can also be fuzzed, where the simulator adds random noise to the scene parameters, such as the speed limit of the road or the goals of simulated agents. This can upsample a small number of converted or manually designed scenes to a larger set that can be used to check for robustness and prevent overfitting. Fuzzing can also help developers understand the space of possible outcomes, as shown in the example below, which fuzzes the reaction time of a synthetic tailgater: An example of fuzzing tailgater reaction time. Source: Waymo. The distribution on the right shows a dot for each variant of the scenario, colored green or red depending on whether a simulated collision occurred. In this experiment, the collision becomes unavoidable once the simulated tailgater’s reaction time exceeds about 1 second. Limitations of pure synthetic simulation With these sources plus fuzzing, we’ve ensured the quantity of scenarios in our library, but we still don’t have any guarantees on the quality. Perhaps the scenarios we (and maybe our generative AI tools) invent are too hard or too easy compared to the distribution of onroad driving our vehicle encounters. If our vehicle drives poorly in a synthetic scenario, does the autonomous driving system need improvement? Or is the scenario unrealistically hard, perhaps because the behavior of its smart agents is too unreasonable? If our vehicle passes with flying colors, is it doing a good job? Or is the scenario library missing some challenging scenarios simply because we did not imagine that they could happen? This is a fundamental problem of pure synthetic simulation. Once we start modifying and fuzzing our simulated scenarios, there isn’t a straightforward way to know whether they remain representative of the real world. And we still need to collect a large quantity of real-world mileage to ensure that we have not missed any rare scenarios. Hybrid simulation We can combine our two types of simulator into a hybrid simulator that takes advantages of the strengths of each, providing an environment that is both realistic and interactive without breaking the bank. From replay simulation, use log replay to ensure every simulated scenario is rooted in a real-world scenario and has perfectly realistic sensor data. From synthetic simulation, make the simulation interactive by selectively replacing other road users with smart agents if they could interact with our vehicle.6 Modified architecture diagram merging parts of replay and synthetic simulation. Hybrid simulation usually serves as the default type of simulation that works well for most use cases. One convenient interpretation is that hybrid simulation is a worry-free replacement for replay simulation: anytime the developer would have used replay, they can absentmindedly switch to hybrid simulation to take care of the most common simulation artifacts while retaining most of the benefits of replay simulation. Conclusion We’ve seen that there are many types of simulation used in autonomous driving. They exist on a spectrum from purely replaying onroad scenarios to fully synthesized environments. The ideal simulation platform allows developers to pick an operating point on that spectrum that fits their use case. Hybrid simulation based on a large volume of real-world miles satisfies most testing needs at a reasonable cost, while fully synthetic modes serve niche use cases that can justify the higher development and operating costs. Cruise has written several deep dives about the usage and scaling of their simulation platform. However, neither Cruise nor Waymo provide many details on the construction of their simulator. ↩ I’ve even heard arguments that it’s only good for making videos. ↩ There exist architectures that are more end-to-end. However, to the best of my knowledge, those systems do not have driverless deployments with nontrivial mileage, making simulation testing less relevant. ↩ Another interactivity problem arises from the replay simulator’s inability to simulate different points of view as the simulated vehicle moves. A large pose divergence often causes the simulated vehicle to drive into an area not observed by the vehicle that produced the onroad log. For example, a simulated vehicle could decide to drive around a corner much earlier. But it wouldn’t be able to see anything until the log data also rounds the corner. No matter where the simulated vehicle drives, it will always be limited to what the logged vehicle saw. ↩ “Computer vision is inverse computer graphics.” ↩ As a nice bonus, because the irrelevant road users are replayed exactly as they drove in real life, this may reduce the compute cost of simulation. ↩

9 months ago 40 votes
Why autonomous trucking is harder than autonomous rideshare

Recently, The Verge asked, “where are all the robot trucks?” It’s a good question. Trucking was supposed to be the ideal first application of autonomous driving. Freeways contain predictable, highly structured driving scenarios. An autonomous truck would not have to deal with the complexities of intersections and two-way traffic. It could easily drive hundreds of miles without encountering a single pedestrian. DALL-E 3 prompt: “Generate an artistic, landscape aspect ratio watercolor painting of a truck with a bright red cab, pulling a white trailer. The truck drives uphill on an empty, rural highway during wintertime, lined with evergreen trees and a snow bank on a foggy, cloudy day.” The trucks could also be commercially viable with only freeway driving capability, or freeways plus a short segment of surface streets needed to reach a transfer hub. The AV company would only need to deal with a limited set of businesses as customers, bypassing the messiness of supporting a large pool of consumers inherent to the B2C model. Autonomous trucks would not be subject to rest requirements. As The Verge notes, “truck operators are allowed to drive a maximum of 11 hours a day and have to take a 30-minute rest after eight consecutive hours behind the wheel. Autonomous trucks would face no such restrictions,” enabling them to provide a service that would be literally unbeatable by a human driver. If you had asked me in 2018, when I first started working in the AV industry, I would’ve bet that driverless trucks would be the first vehicle type to achieve a million-mile driverless deployment. Aurora even pivoted their entire company to trucking in 2020, believing it to be easier than city driving. Yet sitting here in 2024, we know that both Waymo and Cruise have driven millions of miles on city streets — a large portion in the dense urban environment of San Francisco — and there are no driverless truck deployments. What happened? I think the problem is that driverless autonomous trucking is simply harder than driverless rideshare. The trucking problem appears easier at the outset, and indeed many AV developers quickly reach their initial milestones, giving them false confidence. But the difficulty ramps up sharply when the developer starts working on the last bit of polish. They encounter thorny problems related to the high speeds on freeways and trucks’ size, which must be solved before taking the human out of the driver’s seat. What is the driverless bar? Here’s a simplistic framework: No driver in the vehicle. No guarantee of a timely response from remote operators or backend services. Therefore, all safety-critical decisions must be made by the onboard computer alone. Under these constraints, the system still meets or exceeds human safety level. This is a really, really high bar. For example, on surface streets, this means the system on its own is capable of driving at least 100k miles without property damage and 40M miles without fatality.1 The system can still have flaws, but virtually all of those problems must result in a lack of progress, rather than collision or injury. In short, while the system may not know the right thing to do in every scenario, it should never do the wrong thing. (There are several high quality safety frameworks for those interested in a rigorous definition.23 It’s beyond the scope of this post.) Now, let’s look at each aspect of trucking to see how it exacerbates these challenges. Truck-specific challenges Stopping distance vs. sensing range The required sensor capability for an autonomous vehicle is determined by the most challenging scenario that the vehicle needs to handle. A major challenge in trucking is stopping behind a stalled vehicle or large debris in a travel lane. To avoid collision, the autonomous vehicle would need a sensing range greater than or equal to its stopping distance. We’ll make a simplifying assumption that stopping distance defines the minimum detection range requirements. A driverless-quality perception system needs perfect recall on other vehicles within the vehicle’s worst-case stopping distance. Passenger vehicles can decelerate up to –8 m/s². Trucks can only achieve around –4 m/s², which increases the stopping distance and puts the sensing range requirement right at the edge of what today’s sensors can deliver. Here are the sight stopping distances for an empty truck in dry conditions on roads of varying grade:4 Speed (mph) 0% Grade (m) –3% Grade (m) –6% Grade (m) 50 115–141 124–150 136–162 70 122–178 136–162 236–305 Sight stopping distances defined as the distance needed to stop assuming a 2.5-second reaction time with no braking, followed by maximum braking. The distance is computed for an empty truck in dry conditions on roads of varying grade. Stopping distance increases in wet weather or when driving downhill with a load (not shown). Now let’s compare these distances with the capabilities of various sensors: Lidar sensors provide trustworthy 3D data because they take direct measurements based on physical principles. They have a usable range of around 200–250 meters, plenty for city driving but not enough for every truck use case. Lidar detection models may also need to accumulate multiple scans/frames over time to detect faraway objects reliably, especially for smaller items like debris, further decreasing the usable detection range. Note that some solid-state lidars claim significantly more range than 250 meters. These numbers are collected under ideal conditions; for computing minimum sensing capability, we are interested in the range that can provide perfect recall and really great precision. For example, the lidar may be unable to reach its maximum range over the entire field of view, or may require undesirable trade-offs like a scan pattern that reduces point density and field of view to achieve more range. Radar can see farther than lidar. For example, this high-end ZF radar claims vehicle detections up to 350 meters away. Radar is great for tracking moving vehicles, but has trouble distinguishing between stationary vehicles and other background objects. Tesla Autopilot has infamously shown this problem by braking for overpasses and running into stalled vehicles. “Imaging” radars like the ZF device will do better than the radars on production vehicles. They still do not have the azimuth resolution to separate objects beyond 200 meters, where radar input is most needed. Cameras can detect faraway objects as long as there are enough pixels on the object, which leads to the selection of cameras with high resolution and a narrow field of view (telephoto lens). A vehicle will carry multiple narrow cameras for full coverage during turns. However, cameras cannot measure distance or speed directly. A combined camera + radar system using machine learning probably has the best chance here, especially with recent advances in ML-based early fusion, but would need to perform well enough to serve as the primary detection source beyond 200 meters. Training such a model is closer to an open problem than simply receiving that data from a lidar. In summary, we don’t appear to have any sensing solutions with the performance needed for trucks to meet the driverless bar. Controls Controlling a passenger vehicle — determining the amount of steering and throttle input to make the vehicle follow a trajectory — is a simpler problem than controlling a truck. For example, passenger vehicles are generally modeled as a single rigid body, while a truck and its trailer can move separately. The planner and controller need to account for this when making sharp turns and, in extreme low-friction conditions, to avoid jackknifing. These features come in addition to all the usual controls challenges that also apply to passenger vehicles. They can be built but require additional development and validation time. Freeway-specific challenges OK, so trucks are hard, but what about the freeway part? It may now sound appealing to build L4 freeway autonomy for passenger vehicles. However, driving on freeways also brings additional challenges on top of what is needed for city streets. Achieving the minimal risk condition on freeways Autonomous vehicles are supposed to stop when they detect an internal fault or driving situation that they can’t handle. This is called the minimal risk condition (MRC). For example, an autonomous passenger vehicle that detects an error in the HD map or a sensor failure might be programmed to execute a pullover or stop in lane depending on the problem severity. While MRC behaviors are annoying for other road users and embarrassing for the AV developer, they do not add undue risk on surface streets given the low speeds and already chaotic nature of city driving. This gives the AV developer more breathing room (within reason) to deploy a system that does not know how to handle every driving scenario perfectly, but knows enough to stay out of trouble. It’s a different story on the freeway. Stopping in lane becomes much more dangerous with the possibility of a rear-end collision at high speed. All stopping should be planned well in advance, ideally exiting at the next ramp, or at least driving to the closest shoulder with enough room to park. This greatly increases the scope of edge cases that need to be handled autonomously and at freeway speeds. For example: Scene understanding: If the vehicle encounters an unexpected construction zone, crash site, or other non-nominal driving scenario, it’s not enough to detect and stop. Rerouting, while a viable option on surface streets, usually isn’t an option on freeways because it may be difficult or illegal to make a u-turn by the time the vehicle can see the construction. A freeway under construction is also more likely to be the only path to the destination, especially if the autonomous vehicle in question is not designed to drive on city streets. Operational solutions are also not enough for a scaled deployment. AV developers often disallow their vehicles from routing through known problem areas gathered from manually driven scouting vehicles or announcements made by authorities. For a scaled deployment, however, it’s not reasonable to know the status of every mile of road at all times. Therefore, the system needs to find the right path through unstructured scenarios, possibly following instructions from police directing traffic, even if it involves traffic violations such as driving on the wrong side of the road. We know that current state-of-the-art autonomous vehicles still occasionally drive into wet concrete and trenches, which shows it is nontrivial to make a correct decision. Mapping: If the lane lines have been repainted, and the system normally uses an HD map, it needs to ignore the map and build a new one on-the-fly from the perception system’s output. It needs to distinguish between mapping and perception errors. Uptime: Sensor, computer, and software failures need to be virtually eliminated through redundancy and/or engineering elbow grease. The system needs almost perfect uptime. For example, it’s fine to enter a max-braking MRC when losing a sensor or restarting a software module on surface streets, provided those failures are rare. The same maneuver would be dangerous on the freeway, so the failure must be eliminated, or a fallback/redundancy developed. These problems are not impossible to overcome. Every autonomous passenger vehicle has solved them to some extent, with the remaining edge cases punted to some combination of MRC and remote operators. The difference is that, on freeways, they need to be solved with a very high level of reliability to meet the driverless bar. Freeways are boring The features that make freeways simpler — controlled access, no intersections, one-way traffic — also make “interesting” events more rare. This is a double-edged sword. While the simpler environment reduces the number of software features to be developed, it also increases the iteration time and cost. During development, “interesting” events are needed to train data-hungry ML models. For validation, each new software version to be qualified for driverless operation needs to encounter a minimum number of “interesting” events before comparisons to a human safety level can have statistical significance. Overall, iteration becomes more expensive when it takes more vehicle-hours to collect each event. AV developers can only respond by increasing the size of their operations teams or accepting more time between software releases. (Note that simulation is not a perfect solution either. The rarity of events increases vehicle-hours run in simulation, and so far, nobody has shown a substitute for real-world miles in the context of driverless software validation.) Is it ever going to happen? Trucking requires longer range sensing and more complex controls, increasing system complexity and pushing the problem to the bleeding edge of current sensing capabilities. At the same time, driving on freeways brings additional reliability requirements, raising the quality bar on every software component from mapping to scene understanding. If both the truck form factor and the freeway domain increase the level of difficulty, then driverless trucking might be the hardest application of autonomous driving: City Freeway Cars Baseline Harder Trucks Harder Hardest Now that scaled rideshare is mostly working in cities, I expect to see scaled freeway rideshare next. Does this mean driverless trucking will never happen? No, I still believe AV developers will overcome these challenges eventually. Aurora, Kodiak, and Gatik have all promised some form of driverless deployment by the end of the year. We probably won’t see anything close to a million-mile deployment in 2024 though. Getting there will require advances in sensing, machine learning, and a lot of hard work. Thanks to Steven W. and others for the discussions and feedback. This should be considered a bare minimum because humans perform much better on freeways, raising the bar for AVs. Rough numbers taken from Table 3, passenger vehicle national average on surface streets: Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2023). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. arXiv preprint arXiv:2312.13228. (blog) ↩ Kalra, N., & Paddock, S. M. (2016). Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94, 182-193. ↩ Favaro, F., Fraade-Blanar, L., Schnelle, S., Victor, T., Peña, M., Engstrom, J., … & Smith, D. (2023). Building a Credible Case for Safety: Waymo’s Approach for the Determination of Absence of Unreasonable Risk. arXiv preprint arXiv:2306.01917. (blog) ↩ Computed from tables 1 and 2: Harwood, D. W., Glauz, W. D., & Mason, J. M. (1989). Stopping sight distance design for large trucks. Transportation Research Record, 1208, 36-46. ↩

a year ago 16 votes

More in programming

Supa Pecha Kucha

slug: supapechakucha

23 hours ago 3 votes
The Power of Principles in Web Development Decision-Making (article)

Discover how The Epic Programming Principles can transform your web development decision-making, boost your career, and help you build better software.

15 hours ago 3 votes
Our switch to Kamal is complete

In a fit of frustration, I wrote the first version of Kamal in six weeks at the start of 2023. Our plan to get out of the cloud was getting bogged down in enterprisey pricing and Kubernetes complexity. And I refused to accept that running our own hardware had to be that expensive or that convoluted. So I got busy building a cheap and simple alternative.  Now, just two years later, Kamal is deploying every single application in our entire heritage fleet, and everything in active development. Finalizing a perfectly uniform mode of deployment for every web app we've built over the past two decades and still maintain. See, we have this obsession at 37signals: That the modern build-boost-discard cycle of internet applications is a scourge. That users ought to be able to trust that when they adopt a system like Basecamp or HEY, they don't have to fear eviction from the next executive re-org. We call this obsession Until The End Of The Internet. That obsession isn't free, but it's worth it. It means we're still operating the very first version of Basecamp for thousands of paying customers. That's the OG code base from 2003! Which hasn't seen any updates since 2010, beyond security patches, bug fixes, and performance improvements. But we're still operating it, and, along with every other app in our heritage collection, deploying it with Kamal. That just makes me smile, knowing that we have customers who adopted Basecamp in 2004, and are still able to use the same system some twenty years later. In the meantime, we've relaunched and dramatically improved Basecamp many times since. But for customers happy with what they have, there's no forced migration to the latest version. I very much had all of this in mind when designing Kamal. That's one of the reasons I really love Docker. It allows you to encapsulate an entire system, with all of its dependencies, and run it until the end of time. Kind of how modern gaming emulators can run the original ROM of Pac-Man or Pong to perfection and eternity. Kamal seeks to be but a simple wrapper and workflow around this wondrous simplicity. Complexity is but a bridge — and a fragile one at that. To build something durable, you have to make it simple.

an hour ago 1 votes
We all lose when art is anonymised

One rabbit hole I can never resist going down is finding the original creator of a piece of art. This sounds simple, but it’s often quite difficult. The Internet is a maze of social media accounts that only exist to repost other people’s art, usually with minimal or non-existent attribution. A popular image spawns a thousand copies, each a little further from the original. Signatures get cropped, creators’ names vanish, and we’re left with meaningless phrases like “no copyright intended”, as if that magically absolves someone of artistic theft. Why do I do this? I’ve always been a bit obsessive, a bit completionist. I’ve worked in cultural heritage for eight years, which has made me more aware of copyright and more curious about provenance. And it’s satisfying to know I’ve found the original source, that I can’t dig any further. This takes time. It’s digital detective work, using tools like Google Lens and TinEye, and it’s not always easy or possible. Sometimes the original pops straight to the top, but other times it takes a lot of digging to find the source of an image. So many of us have become accustomed to art as an endless, anonymous stream of “content”. A beautiful image appears in our feed, we give it a quick heart, and scroll on, with no thought for the human who sweated blood and tears to create it. That original artist feels distant, disconected. Whatever benefit they might get from the “exposure” of your work going viral, they don’t get any if their name has been removed first. I came across two examples recently that remind me it’s not just artists who miss out – it’s everyone who enjoys art. I saw a photo of some traffic lights on Tumblr. I love their misty, nighttime aesthetic, the way the bright colours of the lights cut through the fog, the totality of the surrounding darkness. But there was no name – somebody had just uploaded the image to their Tumblr page, it was reblogged a bunch of times, and then it appeared on my dashboard. Who took it? I used Google Lens to find the original photographer: Lucas Zimmerman. Then I discovered it was part of a series. And there was a sequel. I found interviews. Context. Related work. I found all this cool stuff, but only because I knew Lucas’s name. Traffic Lights, by Lucas Zimmerman. Published on Behance.net under a CC BY‑NC 4.0 license, and reposted here in accordance with that license. The second example was a silent video of somebody making tiny chess pieces, just captioned “wow”. It was clearly an edit of another video, with fast-paced cuts to make it accommodate a short attention span – and again with no attribution. This was a little harder to find – I had to search several frames in Google Lens before I found a summary on a Russian website, which had a link to a YouTube video by metalworker and woodworker Левша (Levsha). This video is four times longer than the cut-up version I found, in higher resolution, and with commentary from the original creator. I don’t speak Russian, but YouTube has auto-translated subtitles. Now I know how this amazing set was made, and I have a much better understanding of the materials and techniques involved. (This includes the delightful name Wenge wood, which I’d never heard before.) https://youtube.com/watch?v=QoKdDK3y-mQ A piece of art is more than just a single image or video. It’s a process, a human story. When art is detached from its context and creator, we lose something fundamental. Creators lose the chance to benefit from their work, and we lose the opportunity to engage with it in a deeper way. We can’t learn how it was made, find their other work, or discover how to make similar art for ourselves. The Internet has done many wonderful things for art, but it’s also a machine for endless copyright infringement. It’s not just about generative AI and content scraping – those are serious issues, but this problem existed long before any of us had heard of ChatGPT. It’s a thousand tiny paper cuts. How many of us have used an image from the Internet because it showed up in a search, without a second thought for its creator? When Google Images says “images may be subject to copyright”, how many of us have really thought about what that means? Next time you want to use an image from the web, look to see if it’s shared under a license that allows reuse, and make sure you include the appropriate attribution – and if not, look for a different image. Finding the original creator is hard, sometimes impossible. The Internet is full of shadows: copies of things that went offline years ago. But when I succeed, it feels worth the effort – both for the original artist and myself. When I read a book or watch a TV show, the credits guide me to the artists, and I can appreciate both them and the rest of their work. I wish the Internet was more like that. I wish the platforms we rely on put more emphasis on credit and attribution, and the people behind art. The next time an image catches your eye, take a moment. Who made this? What does it mean? What’s their story? [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 1 votes
Apple does AI as Microsoft did mobile

When the iPhone first appeared in 2007, Microsoft was sitting pretty with their mobile strategy. They'd been early to the market with Windows CE, they were fast-following the iPod with their Zune. They also had the dominant operating system, the dominant office package, and control of the enterprise. The future on mobile must have looked so bright! But of course now, we know it wasn't. Steve Ballmer infamously dismissed the iPhone with a chuckle, as he believed all of Microsoft's past glory would guarantee them mobile victory. He wasn't worried at all. He clearly should have been! After reliving that Ballmer moment, it's uncanny to watch this CNBC interview from one year ago with Johny Srouji and John Ternus from Apple on their AI strategy. Ternus even repeats the chuckle!! Exuding the same delusional confidence that lost Ballmer's Microsoft any serious part in the mobile game.  But somehow, Apple's problems with AI seem even more dire. Because there's apparently no one steering the ship. Apple has been promising customers a bag of vaporware since last fall, and they're nowhere close to being able to deliver on the shiny concept demos. The ones that were going to make Apple Intelligence worthy of its name, and not just terrible image generation that is years behind the state of the art. Nobody at Apple seems able or courageous enough to face the music: Apple Intelligence sucks. Siri sucks. None of the vaporware is anywhere close to happening. Yet as late as last week, you have Cook promoting the new MacBook Air with "Apple Intelligence". Yikes. This is partly down to the org chart. John Giannandrea is Apple's VP of ML/AI, and he reports directly to Tim Cook. He's been in the seat since 2018. But Cook evidently does not have the product savvy to be able to tell bullshit from benefit, so he keeps giving Giannandrea more rope. Now the fella has hung Apple's reputation on vaporware, promised all iPhone 16 customers something magical that just won't happen, and even spec-bumped all their devices with more RAM for nothing but diminished margins. Ouch. This is what regression to the mean looks like. This is what fiefdom management looks like. This is what having a company run by a logistics guy looks like. Apple needs a leadership reboot, stat. That asterisk is a stain.

2 days ago 3 votes