Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
71
This is a post for myself, because I wasted a lot of time understanding this bug, and I want to be able to remember it in the future. I expect close to zero others to be interested. The C standard library function isspace() returns a non-zero value (true) for the six "standard" ASCII white-space characters ('\t', '\n', '\v', '\f', '\r', ' '), and any locale-specific characters. By default, a program starts in the "C" locale, which will only return true for the six ASCII white-space characters. However, if the program changes locales, it can return true for other values. As a result, unless you really understand locales, you should use your own version of this function, or ICU4C's u_isspace() function. An implementation of isspace() for ASCII is one line: /* Returns true for the 6 ASCII white-space characters: \t \n \v \f \r ' '. */ int isspace_ascii(int c) { return c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r' || c == ' '; } I ran into this because On Mac OS X,...
a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Evan Jones - Software Engineer | Computer Scientist

Setenv is not Thread Safe and C Doesn't Want to Fix It

You can't safely use the C setenv() or unsetenv() functions in a program that uses threads. Those functions modify global state, and can cause other threads calling getenv() to crash. This also causes crashes in other languages that use those C standard library functions, such as Go's os.Setenv (Go issue) and Rust's std::env::set_var() (Rust issue). I ran into this in a Go program, because Go's built-in DNS resolver can call C's getaddrinfo(), which uses environment variables. This cost me 2 days to track down and file the Go bug. Sadly, this problem has been known for decades. For example, an article from January 2017 said: "None of this is new, but we do re-discover it roughly every five years. See you in 2022." This was only one year off! (She wrote an update in October 2023 after I emailed her about my Go bug.) This is a flaw in the POSIX standard, which extends the C Standard to allow modifying environment varibles. The most infuriating part is that many people who could influence the standard or maintain the C libraries don't see this as a problem. The argument is that the specification clearly documents that setenv() cannot be used with threads. Therefore, if someone does this, the crashes are their fault. We should apparently read every function's specification carefully, not use software written by others, and not use threads. These are unrealistic assumptions in modern software. I think we should instead strive to create APIs that are hard to screw up, and evolve as the ecosystem changes. The C language and standard library continue to play an important role at the base of most software. We either need to figure out how to improve it, or we need to figure out how to abandon it. Why is setenv() not thread-safe? The biggest problem is that getenv() returns a char*, with no need for applications to free it later. One thread could be using this pointer when another thread changes the same environment variable using setenv() or unsetenv(). The getenv() function is perfect if environment variables never change. For example, for accessing a process's initial table of environment variables (see the System V ABI: AMD64 Section 3.4.1). It turns out the C Standard only includes getenv(), so according to C, that is exactly how this should work. However, most implementations also follow the POSIX standard (e.g. POSIX.1-2017), which extends C to include functions that modify the environment. This means the current getenv() API is problematic. Even worse, putenv() adds a char* to the set of environment variables. It is explicitly required that if the application modifies the memory after putenv() returns, it modifies the environment variables. This means applications can modify the value passed to putenv() at any time, without any synchronization. FreeBSD used to implement putenv() by copying the value, but it changed it with FreeBSD 7 in 2008, which suggests some programs really do depend on modifying the environment in this fashion (see FreeBSD putenv man page). As a final problem, environ is a NULL-terminated array of pointers (char**) that an application can read and assign to (see definition in POSIX.1-2017). This is how applications can iterate over all environment variables. Accesses to this array are not thread-safe. However, in my experience many fewer applications use this than getenv() and setenv(). However, this does cause some libraries to not maintain the set of environment variables in a thread-safe way, since they directly update this table. Environment variable implementations Implementations need to choose what do do when an application overwrites an existing variable. I looked at glibc, musl, Solaris/Illumos, and FreeBSD/Apple's C standard libraries, and they make the following choices: Never free environment variables (glibc, Solaris/Illumos): Calling setenv() repeatedly is effectively a memory leak. However, once a value is returned from getenv(), it is immutable and can be used by threads safely. Free the environment variables (musl, FreeBSD/Apple): Using the pointer returned by getenv() after another thread calls setenv() can crash. A second problem is ensuring the set of environment variables is updated in a thread-safe fashion. This is what causes crashes in glibc. glibc uses an array to hold pointers to the "NAME=value" strings. It holds a lock in setenv() when changing this array, but not in getenv(). If a thread calling setenv() needs to resize the array of pointers, it copies the values to a new array and frees the previous one. This can cause other threads executing getenv() to crash, since they are now iterating deallocated memory. This is particularly annoying since glibc already leaks environment variables, and holds a lock in setenv(). All it needs to do is hold the lock inside getenv(), and it would no longer crash. This would make getenv() slightly slower. However, getenv() already uses a linear search of the array, so performance does not appear to be a concern. More sophisticated implementations are possible if this is a problem, such as Solaris/Illumos's lock-free implementation. Why do programs use environment variables? Environment variables useful for configuring shared libraries or language runtimes that are included in other programs. This allows users to change the configuration, without program authors needing to explicitly pass the configuration in. One alternative is command line flags, which requires programs to parse them and pass them in to the libraries. Another alternative are configuration files, which then need some other way to disable or configure, to be able to test new configurations. Environment variables are a simple solution. AS a result, many libraries call getenv() (see a partial list below). Since many libraries are configured through environment variables, a program may need to change these variables to configure the libraries it uses. This is common at application startup. This causes programs to need to call setenv(). Given this issue, it seems like libraries should also provide a way to explicitly configure any settings, and avoid using environment variables. We should fix this problem, and we can In my opinion, it is rediculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem. First, we can make a thread-safe implementation, like Illumos/Solaris. This has some limitations: it leaks memory in setenv(), and is still unsafe if a program uses putenv() or the environ variable. However, this is an improvement over the current Linux and Apple implementations. The second solution is to add new APIs to get one and get all environment variables that are thread-safe by design, like Microsoft's getenv_s() (see below for the controversy around C11's "Annex K"). My preferred solution would be to do both. This would reduce the chances of hitting this problem for existing programs and libraries, and also provide a path to avoid the problems entirely for new code or languages like Go and Rust. My rough idea would be the following: Add a function to copy one single environment variable to a user-specified buffer, similar to getenv_s(). Add a thread-safe API to iterate over all environment variables, or to copy all variables out. Mark getenv() as deprecated, recommending the new thread-safe getenv() function instead. Mark putenv() as deprecated, recommending setenv() instead. Mark environ as deprecated, recommending environment variable functions instead. Update the implementation of environment varibles to be thread-safe. This requires leaking memory if getenv() is used on a variable, but we can detect if the old functions are used, and only leak memory in that case. This means programs written in other languages will avoid these problems as soon as their runtimes are updated. Update the C and POSIX standards to require the above changes. This would be progress. The getenv_s / C Standard Annex K controversy Microsoft provides getenv_s(), which copies the environment variable into a caller-provided buffer. This is easy to make thread-safe by holding a read lock while copying the variable. After the function returns, future changes to the environment have no effect. This is included in the C11 Standard as Annex K "Bounds Checking Interfaces". The C standard Annexes are optional features. This Annex includes new functions intended to make it harder to make mistakes with buffers that are the wrong size. The first draft of this extension was published in 2003. This is when Microsoft was focusing on "Trustworthy Computing" after a January 2002 memo from Bill Gates. Basically, Windows wasn't designed to be connected to the Internet, and now that it was, people were finding many security problems. Lots of them were caused by buffer handling mistakes. Microsoft developed new versions of a number of problematic functions, and added checks to the Visual C++ compiler to warn about using the old ones. They then attempted to standardize these functions. My understanding is the people responsible for the Unix POSIX standards did not like the design of these functions, so they refused to implement them. For more details, see Field Experience With Annex K published in September 2015, Stack Overflow: Why didn't glibc implement _s functions? updated March 2023, and Rich Felker of musl on both technical and social reasons for not implementing Annex K from February 2019. I haven't looked at the rest of the functions, but having spent way too long looking at getenv(), the general idea of getenv_s() seems like a good idea to me. Standardizing this would help avoid this problem. Incomplete list of common environment variables This is a list of some uses of environment variables from fairly widely used libraries and services. This shows that environment variables are pretty widely used. Cloud Provider Credentials and Services AWS's SDKs for credentials (e.g. AWS_ACCESS_KEY_ID) Google Cloud Application Default Credentials (e.g. GOOGLE_APPLICATION_CREDENTIALS) Microsoft Azure Default Azure Credential (e.g. AZURE_CLIENT_ID) AWS's Lambda serverless product: sets a large number of variables like AWS_REGION, AWS_LAMBDA_FUNCTION_NAME, and credentials like AWS_SECRET_ACCESS_KEY Google Cloud Run serverless product: configuration like PORT, K_SERVICE, K_REVISION Kubernetes service discovery: Defines variables SERVICE_NAME_HOST and SERVICE_NAME_PORT. Third-party C/C++ Libraries OpenTelemetry: Metrics and tracing. Many environment variables like OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES. OpenSSL: many configurable variables like HTTPS_PROXY, OPENSSL_CONF, OPENSSL_ENGINES. BoringSSL: Google's fork of OpenSSL used in Chrome and others. It reads SSLKEYLOGFILE just like OpenSSL for logging TLS keys for debugging. Libcurl: proxies, SSL/TLS configuration and debugging like HTTPS_PROXY, CURL_SSL_BACKEND, CURL_DEBUG. Libpq Postgres client library: connection parameters including credentials like PGHOSTADDR, PGDATABASE, and PGPASSWORD. Rust Standard Library std::thread RUST_MIN_STACK: Calls std::env::var() on the first call to spawn() a new thread. It is cached in a static atomic variable and never read again. See implementation in thread::min_stack(). std::backtrace RUST_LIB_BACKTRACE: Calls std::env::var() on the first call to capture a backtrace. It is cached in a static atomic variable and never read again. See implementation in Backtrace::enabled().

a year ago 32 votes
Random Load Balancing is Unevenly Distributed

This is a reminder that random load balancing is unevenly distributed. If we distribute a set of items randomly across a set of servers (e.g. by hashing, or by randomly selecting a server), the average number of items on each server is num_items / num_servers. It is easy to assume this means each server has close to the same number of items. However, since we are selecting servers at random, they will have different numbers of items, and the imbalance can be important. For load balancing, a reasonable model is that each server has fixed capacity (e.g. it can serve 3000 requests/second, or store 100 items, etc.). We need to divide the total workload over the servers, so that each server stays below its capacity. This means the number of servers is determined by the most loaded server, not the average. This is a classic balls in bins problem that has been well studied, and there are some interesting theoretical results. However, I wanted some specific numbers, so I wrote a small simulation. The summary is that the imbalance depends on the expected number of items per server (that is, num_items / num_servers). This means workload is more balanced with fewer servers, or with more items. This means that dividing a set of items over more servers makes the distribution more unfair, which is a reason we can get worse than linear scaling of a distributed system. Let's make this more concrete with an example. Let's assume we have a workload of 1000 items, and each server can hold a maximum of 100 items. If we place the exact same number of items on each server, we only need 10 servers, and each of them is completely busy. However, if we place the items randomly, then the median (p50) number of items is 100 items. This means half the servers will have more than 100 items, and will be overloaded. If we want less than a 1% chance of an overloaded server, we need to look at the 99th percentile (p99) server load. We need to use at least 13 servers, which has a p99 load of 97 items. For 14 servers, the average is 77 items, so our servers are on average 23% idle. This shows how the imbalance leads to wasted capacity. This is a bit of an extreme example, because the number of items is small. Let's assume we can make the items 10× smaller, say by dividing them into pieces. Our workload now consists of 10k items, and each server has the capacity to hold 1000 (1k) items. Our perfectly balanced workload still needs 10 servers. With random load balancing, to have a less than 1 in 1000 chance of exceeding our capacity, we only need 11 servers, which has a p99 load of 98 items and a p999 of 100 items. With 11 servers, the average number of items is 910 or 91%, so our servers are only 9% idle. This shows how splitting work into smaller pieces improves load balancing. Another way to look at this is to think about a scaling scenario. Let's go back to our workload of 1000 items, where each server can handle 100 items, and we have 13 servers to ensure we have less than a 1% chance of an overloaded server. Now let's assume the amount of work per item doubles, for example because the service has become more popular, so each item has become larger. Now, each server can hold a maximum of 50 items. If we have perfectly linear scaling, we can double the number of servers from 13 to 26 to handle this workload. However, 26 servers has a p99 of 53 items, so we again have a more than 1% chance of overload. We need to use 28 servers which has a p99 of 50 items. This means we doubled the workload, but had to increase the number of servers from 13 to 28, which is 2.15×. This is sub-linear scaling. As a way to visualize the imbalance, the chart below shows the p99 to average ratio, which is a measure of how imbalanced the system is. If everything is perfectly balanced, the value is 1.0. A value of 2.0 means 1% of servers will have double the number of items of the average server. This shows that the imbalance increases with the number of servers, and increases with fewer items. Power of Two Random Choices Another way to improve load balancing is to have smarter placement. Perfect placement can be hard, but it is often possible to use the "power of two random choices" technique: select two servers at random, and place the item on the least loaded of the two. This makes the distribution much more balanced. For 1000 items and 100 items/server, 11 servers has a p999 of 93 items, so much less than 0.1% chance of overload, compared to needing 14 servers with random load balancing. For the scaling scenario where each server can only handle 50 items, we only need 21 servers to have a p999 of 50 items, compared to 28 servers with random load balancing. The downside of the two choices technique is that each request is now more expensive, since it must query two servers instead of one. However, in many cases where the "item not found" requests are much less expensive than the "item found" requests, this can still be a substantial improvement. For another look at how this improves load balancing, with a nice simulation that includes information delays, see Marc Brooker's blog post. Raw simulation output I will share the code for this simulation later. simulating placing items on servers with random selection iterations=10000 (number of times num_items are placed on num_servers) measures the fraction of items on each server (server_items/num_items) and reports the percentile of all servers in the run P99_AVG_RATIO = p99 / average; approximately the worst server compared to average num_items=1000: num_servers=3 p50=0.33300 p95=0.35800 p99=0.36800 p999=0.37900 AVG=0.33333; P99_AVG_RATIO=1.10400; ITEMS_PER_NODE=333.3 num_servers=5 p50=0.20000 p95=0.22100 p99=0.23000 p999=0.24000 AVG=0.20000; P99_AVG_RATIO=1.15000; ITEMS_PER_NODE=200.0 num_servers=10 p50=0.10000 p95=0.11600 p99=0.12300 p999=0.13100 AVG=0.10000; P99_AVG_RATIO=1.23000; ITEMS_PER_NODE=100.0 num_servers=11 p50=0.09100 p95=0.10600 p99=0.11300 p999=0.12000 AVG=0.09091; P99_AVG_RATIO=1.24300; ITEMS_PER_NODE=90.9 num_servers=12 p50=0.08300 p95=0.09800 p99=0.10400 p999=0.11200 AVG=0.08333; P99_AVG_RATIO=1.24800; ITEMS_PER_NODE=83.3 num_servers=13 p50=0.07700 p95=0.09100 p99=0.09700 p999=0.10400 AVG=0.07692; P99_AVG_RATIO=1.26100; ITEMS_PER_NODE=76.9 num_servers=14 p50=0.07100 p95=0.08500 p99=0.09100 p999=0.09800 AVG=0.07143; P99_AVG_RATIO=1.27400; ITEMS_PER_NODE=71.4 num_servers=25 p50=0.04000 p95=0.05000 p99=0.05500 p999=0.06000 AVG=0.04000; P99_AVG_RATIO=1.37500; ITEMS_PER_NODE=40.0 num_servers=50 p50=0.02000 p95=0.02800 p99=0.03100 p999=0.03500 AVG=0.02000; P99_AVG_RATIO=1.55000; ITEMS_PER_NODE=20.0 num_servers=100 p50=0.01000 p95=0.01500 p99=0.01800 p999=0.02100 AVG=0.01000; P99_AVG_RATIO=1.80000; ITEMS_PER_NODE=10.0 num_servers=1000 p50=0.00100 p95=0.00300 p99=0.00400 p999=0.00500 AVG=0.00100; P99_AVG_RATIO=4.00000; ITEMS_PER_NODE=1.0 num_items=2000: num_servers=3 p50=0.33350 p95=0.35050 p99=0.35850 p999=0.36550 AVG=0.33333; P99_AVG_RATIO=1.07550; ITEMS_PER_NODE=666.7 num_servers=5 p50=0.20000 p95=0.21500 p99=0.22150 p999=0.22850 AVG=0.20000; P99_AVG_RATIO=1.10750; ITEMS_PER_NODE=400.0 num_servers=10 p50=0.10000 p95=0.11100 p99=0.11600 p999=0.12150 AVG=0.10000; P99_AVG_RATIO=1.16000; ITEMS_PER_NODE=200.0 num_servers=11 p50=0.09100 p95=0.10150 p99=0.10650 p999=0.11150 AVG=0.09091; P99_AVG_RATIO=1.17150; ITEMS_PER_NODE=181.8 num_servers=12 p50=0.08350 p95=0.09350 p99=0.09800 p999=0.10300 AVG=0.08333; P99_AVG_RATIO=1.17600; ITEMS_PER_NODE=166.7 num_servers=13 p50=0.07700 p95=0.08700 p99=0.09100 p999=0.09600 AVG=0.07692; P99_AVG_RATIO=1.18300; ITEMS_PER_NODE=153.8 num_servers=14 p50=0.07150 p95=0.08100 p99=0.08500 p999=0.09000 AVG=0.07143; P99_AVG_RATIO=1.19000; ITEMS_PER_NODE=142.9 num_servers=25 p50=0.04000 p95=0.04750 p99=0.05050 p999=0.05450 AVG=0.04000; P99_AVG_RATIO=1.26250; ITEMS_PER_NODE=80.0 num_servers=50 p50=0.02000 p95=0.02550 p99=0.02750 p999=0.03050 AVG=0.02000; P99_AVG_RATIO=1.37500; ITEMS_PER_NODE=40.0 num_servers=100 p50=0.01000 p95=0.01400 p99=0.01550 p999=0.01750 AVG=0.01000; P99_AVG_RATIO=1.55000; ITEMS_PER_NODE=20.0 num_servers=1000 p50=0.00100 p95=0.00250 p99=0.00300 p999=0.00400 AVG=0.00100; P99_AVG_RATIO=3.00000; ITEMS_PER_NODE=2.0 num_items=5000: num_servers=3 p50=0.33340 p95=0.34440 p99=0.34920 p999=0.35400 AVG=0.33333; P99_AVG_RATIO=1.04760; ITEMS_PER_NODE=1666.7 num_servers=5 p50=0.20000 p95=0.20920 p99=0.21320 p999=0.21740 AVG=0.20000; P99_AVG_RATIO=1.06600; ITEMS_PER_NODE=1000.0 num_servers=10 p50=0.10000 p95=0.10700 p99=0.11000 p999=0.11320 AVG=0.10000; P99_AVG_RATIO=1.10000; ITEMS_PER_NODE=500.0 num_servers=11 p50=0.09080 p95=0.09760 p99=0.10040 p999=0.10380 AVG=0.09091; P99_AVG_RATIO=1.10440; ITEMS_PER_NODE=454.5 num_servers=12 p50=0.08340 p95=0.08980 p99=0.09260 p999=0.09580 AVG=0.08333; P99_AVG_RATIO=1.11120; ITEMS_PER_NODE=416.7 num_servers=13 p50=0.07680 p95=0.08320 p99=0.08580 p999=0.08900 AVG=0.07692; P99_AVG_RATIO=1.11540; ITEMS_PER_NODE=384.6 num_servers=14 p50=0.07140 p95=0.07740 p99=0.08000 p999=0.08300 AVG=0.07143; P99_AVG_RATIO=1.12000; ITEMS_PER_NODE=357.1 num_servers=25 p50=0.04000 p95=0.04460 p99=0.04660 p999=0.04880 AVG=0.04000; P99_AVG_RATIO=1.16500; ITEMS_PER_NODE=200.0 num_servers=50 p50=0.02000 p95=0.02340 p99=0.02480 p999=0.02640 AVG=0.02000; P99_AVG_RATIO=1.24000; ITEMS_PER_NODE=100.0 num_servers=100 p50=0.01000 p95=0.01240 p99=0.01340 p999=0.01460 AVG=0.01000; P99_AVG_RATIO=1.34000; ITEMS_PER_NODE=50.0 num_servers=1000 p50=0.00100 p95=0.00180 p99=0.00220 p999=0.00260 AVG=0.00100; P99_AVG_RATIO=2.20000; ITEMS_PER_NODE=5.0 num_items=10000: num_servers=3 p50=0.33330 p95=0.34110 p99=0.34430 p999=0.34820 AVG=0.33333; P99_AVG_RATIO=1.03290; ITEMS_PER_NODE=3333.3 num_servers=5 p50=0.20000 p95=0.20670 p99=0.20950 p999=0.21260 AVG=0.20000; P99_AVG_RATIO=1.04750; ITEMS_PER_NODE=2000.0 num_servers=10 p50=0.10000 p95=0.10500 p99=0.10700 p999=0.10940 AVG=0.10000; P99_AVG_RATIO=1.07000; ITEMS_PER_NODE=1000.0 num_servers=11 p50=0.09090 p95=0.09570 p99=0.09770 p999=0.09990 AVG=0.09091; P99_AVG_RATIO=1.07470; ITEMS_PER_NODE=909.1 num_servers=12 p50=0.08330 p95=0.08790 p99=0.08980 p999=0.09210 AVG=0.08333; P99_AVG_RATIO=1.07760; ITEMS_PER_NODE=833.3 num_servers=13 p50=0.07690 p95=0.08130 p99=0.08320 p999=0.08530 AVG=0.07692; P99_AVG_RATIO=1.08160; ITEMS_PER_NODE=769.2 num_servers=14 p50=0.07140 p95=0.07570 p99=0.07740 p999=0.07950 AVG=0.07143; P99_AVG_RATIO=1.08360; ITEMS_PER_NODE=714.3 num_servers=25 p50=0.04000 p95=0.04330 p99=0.04460 p999=0.04620 AVG=0.04000; P99_AVG_RATIO=1.11500; ITEMS_PER_NODE=400.0 num_servers=50 p50=0.02000 p95=0.02230 p99=0.02330 p999=0.02440 AVG=0.02000; P99_AVG_RATIO=1.16500; ITEMS_PER_NODE=200.0 num_servers=100 p50=0.01000 p95=0.01170 p99=0.01240 p999=0.01320 AVG=0.01000; P99_AVG_RATIO=1.24000; ITEMS_PER_NODE=100.0 num_servers=1000 p50=0.00100 p95=0.00150 p99=0.00180 p999=0.00210 AVG=0.00100; P99_AVG_RATIO=1.80000; ITEMS_PER_NODE=10.0 num_items=100000: num_servers=3 p50=0.33333 p95=0.33579 p99=0.33681 p999=0.33797 AVG=0.33333; P99_AVG_RATIO=1.01043; ITEMS_PER_NODE=33333.3 num_servers=5 p50=0.20000 p95=0.20207 p99=0.20294 p999=0.20393 AVG=0.20000; P99_AVG_RATIO=1.01470; ITEMS_PER_NODE=20000.0 num_servers=10 p50=0.10000 p95=0.10157 p99=0.10222 p999=0.10298 AVG=0.10000; P99_AVG_RATIO=1.02220; ITEMS_PER_NODE=10000.0 num_servers=11 p50=0.09091 p95=0.09241 p99=0.09304 p999=0.09379 AVG=0.09091; P99_AVG_RATIO=1.02344; ITEMS_PER_NODE=9090.9 num_servers=12 p50=0.08334 p95=0.08477 p99=0.08537 p999=0.08602 AVG=0.08333; P99_AVG_RATIO=1.02444; ITEMS_PER_NODE=8333.3 num_servers=13 p50=0.07692 p95=0.07831 p99=0.07888 p999=0.07954 AVG=0.07692; P99_AVG_RATIO=1.02544; ITEMS_PER_NODE=7692.3 num_servers=14 p50=0.07143 p95=0.07277 p99=0.07332 p999=0.07396 AVG=0.07143; P99_AVG_RATIO=1.02648; ITEMS_PER_NODE=7142.9 num_servers=25 p50=0.04000 p95=0.04102 p99=0.04145 p999=0.04193 AVG=0.04000; P99_AVG_RATIO=1.03625; ITEMS_PER_NODE=4000.0 num_servers=50 p50=0.02000 p95=0.02073 p99=0.02103 p999=0.02138 AVG=0.02000; P99_AVG_RATIO=1.05150; ITEMS_PER_NODE=2000.0 num_servers=100 p50=0.01000 p95=0.01052 p99=0.01074 p999=0.01099 AVG=0.01000; P99_AVG_RATIO=1.07400; ITEMS_PER_NODE=1000.0 num_servers=1000 p50=0.00100 p95=0.00117 p99=0.00124 p999=0.00132 AVG=0.00100; P99_AVG_RATIO=1.24000; ITEMS_PER_NODE=100.0 power of two choices num_items=1000: num_servers=3 p50=0.33300 p95=0.33400 p99=0.33500 p999=0.33600 AVG=0.33333; P99_AVG_RATIO=1.00500; ITEMS_PER_NODE=333.3 num_servers=5 p50=0.20000 p95=0.20100 p99=0.20200 p999=0.20300 AVG=0.20000; P99_AVG_RATIO=1.01000; ITEMS_PER_NODE=200.0 num_servers=10 p50=0.10000 p95=0.10100 p99=0.10200 p999=0.10200 AVG=0.10000; P99_AVG_RATIO=1.02000; ITEMS_PER_NODE=100.0 num_servers=11 p50=0.09100 p95=0.09200 p99=0.09300 p999=0.09300 AVG=0.09091; P99_AVG_RATIO=1.02300; ITEMS_PER_NODE=90.9 num_servers=12 p50=0.08300 p95=0.08500 p99=0.08500 p999=0.08600 AVG=0.08333; P99_AVG_RATIO=1.02000; ITEMS_PER_NODE=83.3 num_servers=13 p50=0.07700 p95=0.07800 p99=0.07900 p999=0.07900 AVG=0.07692; P99_AVG_RATIO=1.02700; ITEMS_PER_NODE=76.9 num_servers=14 p50=0.07200 p95=0.07300 p99=0.07300 p999=0.07400 AVG=0.07143; P99_AVG_RATIO=1.02200; ITEMS_PER_NODE=71.4 num_servers=25 p50=0.04000 p95=0.04100 p99=0.04200 p999=0.04200 AVG=0.04000; P99_AVG_RATIO=1.05000; ITEMS_PER_NODE=40.0 num_servers=50 p50=0.02000 p95=0.02100 p99=0.02200 p999=0.02200 AVG=0.02000; P99_AVG_RATIO=1.10000; ITEMS_PER_NODE=20.0 num_servers=100 p50=0.01000 p95=0.01100 p99=0.01200 p999=0.01200 AVG=0.01000; P99_AVG_RATIO=1.20000; ITEMS_PER_NODE=10.0 num_servers=1000 p50=0.00100 p95=0.00200 p99=0.00200 p999=0.00300 AVG=0.00100; P99_AVG_RATIO=2.00000; ITEMS_PER_NODE=1.0 power of two choices num_items=2000: num_servers=3 p50=0.33350 p95=0.33400 p99=0.33400 p999=0.33450 AVG=0.33333; P99_AVG_RATIO=1.00200; ITEMS_PER_NODE=666.7 num_servers=5 p50=0.20000 p95=0.20050 p99=0.20100 p999=0.20150 AVG=0.20000; P99_AVG_RATIO=1.00500; ITEMS_PER_NODE=400.0 num_servers=10 p50=0.10000 p95=0.10050 p99=0.10100 p999=0.10100 AVG=0.10000; P99_AVG_RATIO=1.01000; ITEMS_PER_NODE=200.0 num_servers=11 p50=0.09100 p95=0.09150 p99=0.09200 p999=0.09200 AVG=0.09091; P99_AVG_RATIO=1.01200; ITEMS_PER_NODE=181.8 num_servers=12 p50=0.08350 p95=0.08400 p99=0.08400 p999=0.08450 AVG=0.08333; P99_AVG_RATIO=1.00800; ITEMS_PER_NODE=166.7 num_servers=13 p50=0.07700 p95=0.07750 p99=0.07800 p999=0.07800 AVG=0.07692; P99_AVG_RATIO=1.01400; ITEMS_PER_NODE=153.8 num_servers=14 p50=0.07150 p95=0.07200 p99=0.07250 p999=0.07250 AVG=0.07143; P99_AVG_RATIO=1.01500; ITEMS_PER_NODE=142.9 num_servers=25 p50=0.04000 p95=0.04050 p99=0.04100 p999=0.04100 AVG=0.04000; P99_AVG_RATIO=1.02500; ITEMS_PER_NODE=80.0 num_servers=50 p50=0.02000 p95=0.02050 p99=0.02100 p999=0.02100 AVG=0.02000; P99_AVG_RATIO=1.05000; ITEMS_PER_NODE=40.0 num_servers=100 p50=0.01000 p95=0.01050 p99=0.01100 p999=0.01100 AVG=0.01000; P99_AVG_RATIO=1.10000; ITEMS_PER_NODE=20.0 num_servers=1000 p50=0.00100 p95=0.00150 p99=0.00200 p999=0.00200 AVG=0.00100; P99_AVG_RATIO=2.00000; ITEMS_PER_NODE=2.0 power of two choices num_items=5000: num_servers=3 p50=0.33340 p95=0.33360 p99=0.33360 p999=0.33380 AVG=0.33333; P99_AVG_RATIO=1.00080; ITEMS_PER_NODE=1666.7 num_servers=5 p50=0.20000 p95=0.20020 p99=0.20040 p999=0.20060 AVG=0.20000; P99_AVG_RATIO=1.00200; ITEMS_PER_NODE=1000.0 num_servers=10 p50=0.10000 p95=0.10020 p99=0.10040 p999=0.10040 AVG=0.10000; P99_AVG_RATIO=1.00400; ITEMS_PER_NODE=500.0 num_servers=11 p50=0.09100 p95=0.09120 p99=0.09120 p999=0.09140 AVG=0.09091; P99_AVG_RATIO=1.00320; ITEMS_PER_NODE=454.5 num_servers=12 p50=0.08340 p95=0.08360 p99=0.08360 p999=0.08380 AVG=0.08333; P99_AVG_RATIO=1.00320; ITEMS_PER_NODE=416.7 num_servers=13 p50=0.07700 p95=0.07720 p99=0.07720 p999=0.07740 AVG=0.07692; P99_AVG_RATIO=1.00360; ITEMS_PER_NODE=384.6 num_servers=14 p50=0.07140 p95=0.07160 p99=0.07180 p999=0.07180 AVG=0.07143; P99_AVG_RATIO=1.00520; ITEMS_PER_NODE=357.1 num_servers=25 p50=0.04000 p95=0.04020 p99=0.04040 p999=0.04040 AVG=0.04000; P99_AVG_RATIO=1.01000; ITEMS_PER_NODE=200.0 num_servers=50 p50=0.02000 p95=0.02020 p99=0.02040 p999=0.02040 AVG=0.02000; P99_AVG_RATIO=1.02000; ITEMS_PER_NODE=100.0 num_servers=100 p50=0.01000 p95=0.01020 p99=0.01040 p999=0.01040 AVG=0.01000; P99_AVG_RATIO=1.04000; ITEMS_PER_NODE=50.0 num_servers=1000 p50=0.00100 p95=0.00120 p99=0.00140 p999=0.00140 AVG=0.00100; P99_AVG_RATIO=1.40000; ITEMS_PER_NODE=5.0 power of two choices num_items=10000: num_servers=3 p50=0.33330 p95=0.33340 p99=0.33350 p999=0.33360 AVG=0.33333; P99_AVG_RATIO=1.00050; ITEMS_PER_NODE=3333.3 num_servers=5 p50=0.20000 p95=0.20010 p99=0.20020 p999=0.20030 AVG=0.20000; P99_AVG_RATIO=1.00100; ITEMS_PER_NODE=2000.0 num_servers=10 p50=0.10000 p95=0.10010 p99=0.10020 p999=0.10020 AVG=0.10000; P99_AVG_RATIO=1.00200; ITEMS_PER_NODE=1000.0 num_servers=11 p50=0.09090 p95=0.09100 p99=0.09110 p999=0.09110 AVG=0.09091; P99_AVG_RATIO=1.00210; ITEMS_PER_NODE=909.1 num_servers=12 p50=0.08330 p95=0.08350 p99=0.08350 p999=0.08360 AVG=0.08333; P99_AVG_RATIO=1.00200; ITEMS_PER_NODE=833.3 num_servers=13 p50=0.07690 p95=0.07700 p99=0.07710 p999=0.07720 AVG=0.07692; P99_AVG_RATIO=1.00230; ITEMS_PER_NODE=769.2 num_servers=14 p50=0.07140 p95=0.07160 p99=0.07160 p999=0.07170 AVG=0.07143; P99_AVG_RATIO=1.00240; ITEMS_PER_NODE=714.3 num_servers=25 p50=0.04000 p95=0.04010 p99=0.04020 p999=0.04020 AVG=0.04000; P99_AVG_RATIO=1.00500; ITEMS_PER_NODE=400.0 num_servers=50 p50=0.02000 p95=0.02010 p99=0.02020 p999=0.02020 AVG=0.02000; P99_AVG_RATIO=1.01000; ITEMS_PER_NODE=200.0 num_servers=100 p50=0.01000 p95=0.01010 p99=0.01020 p999=0.01020 AVG=0.01000; P99_AVG_RATIO=1.02000; ITEMS_PER_NODE=100.0 num_servers=1000 p50=0.00100 p95=0.00110 p99=0.00120 p999=0.00120 AVG=0.00100; P99_AVG_RATIO=1.20000; ITEMS_PER_NODE=10.0 power of two choices num_items=100000: num_servers=3 p50=0.33333 p95=0.33334 p99=0.33335 p999=0.33336 AVG=0.33333; P99_AVG_RATIO=1.00005; ITEMS_PER_NODE=33333.3 num_servers=5 p50=0.20000 p95=0.20001 p99=0.20002 p999=0.20003 AVG=0.20000; P99_AVG_RATIO=1.00010; ITEMS_PER_NODE=20000.0 num_servers=10 p50=0.10000 p95=0.10001 p99=0.10002 p999=0.10002 AVG=0.10000; P99_AVG_RATIO=1.00020; ITEMS_PER_NODE=10000.0 num_servers=11 p50=0.09091 p95=0.09092 p99=0.09093 p999=0.09093 AVG=0.09091; P99_AVG_RATIO=1.00023; ITEMS_PER_NODE=9090.9 num_servers=12 p50=0.08333 p95=0.08335 p99=0.08335 p999=0.08336 AVG=0.08333; P99_AVG_RATIO=1.00020; ITEMS_PER_NODE=8333.3 num_servers=13 p50=0.07692 p95=0.07694 p99=0.07694 p999=0.07695 AVG=0.07692; P99_AVG_RATIO=1.00022; ITEMS_PER_NODE=7692.3 num_servers=14 p50=0.07143 p95=0.07144 p99=0.07145 p999=0.07145 AVG=0.07143; P99_AVG_RATIO=1.00030; ITEMS_PER_NODE=7142.9 num_servers=25 p50=0.04000 p95=0.04001 p99=0.04002 p999=0.04002 AVG=0.04000; P99_AVG_RATIO=1.00050; ITEMS_PER_NODE=4000.0 num_servers=50 p50=0.02000 p95=0.02001 p99=0.02002 p999=0.02002 AVG=0.02000; P99_AVG_RATIO=1.00100; ITEMS_PER_NODE=2000.0 num_servers=100 p50=0.01000 p95=0.01001 p99=0.01002 p999=0.01002 AVG=0.01000; P99_AVG_RATIO=1.00200; ITEMS_PER_NODE=1000.0 num_servers=1000 p50=0.00100 p95=0.00101 p99=0.00102 p999=0.00102 AVG=0.00100; P99_AVG_RATIO=1.02000; ITEMS_PER_NODE=100.0

a year ago 21 votes
Nanosecond timestamp collisions are common

I was wondering: how often do nanosecond timestamps collide on modern systems? The answer is: very often, like 5% of all samples, when reading the clock on all 4 physical cores at the same time. As a result, I think it is unsafe to assume that a raw nanosecond timestamp is a unique identifier. I wrote a small test program to test this. I used Go, which records both the "absolute" time and the "monotonic clock" relative time on each call to time.Now(), so I compared both the relative difference between consecutive timestamps, as well as just the absolute timestamps. As expected, the behavior depends on the system, so I observe very different results on Mac OS X and Linux. On Linux, within a single thread, both the absolute and monotonic times always increase. On my system, the minimum increment was 32 ns. Between threads, approximately 5% of the absolute times were exactly the same as other threads. Even with 2 threads on a 4 core system, approximately 2% of timestamps collided. On Mac OS X: the absolute time has microsecond resolution, so there are an astronomical number of collisions when I repeat this same test. Even within a thread I often observe the monotonic clock not increment. See the test program on Github if you are curious.

a year ago 18 votes
How much does the read/write buffer size matter for socket throughput?

The read() and write() system calls take a variable-length byte array as an argument. As a simplified model, the time for the system call should be some constant "per-call" time, plus time directly proportional to the number of bytes in the array. That is, the time for each call should be time = (per_call_minimum_time) + (array_len) × (per_byte_time). With this model, using a larger buffer should increase throughput, asymptotically approaching 1/per_byte_time. I was curious: do real system calls behave this way? What are the ideal buffer sizes for read() and write() if we want to maximize throughput? I decided to do some experiments with blocking I/O. These are not rigorous, and I suspect the results will vary significantly if the hardware and software are different than one the system I tested. The really short answer is that a buffer of 32 KiB is a good starting point on today's systems, and I would want to measure the performance to go beyond that. However, for large writes, performance can increase. On Linux, the simple model holds for small buffers (≤ 4 KiB), but once the program approaches the maximum throughput, the throughput becomes highly variable and in many cases decreases as the buffers get larger. For blocking I/O, approximately 32 KiB is large enough to hit the maximum throughput for read(), but write() throughput improves with buffers up to around 256 KiB - 1 MiB. The reason for the asymmetry is that the Linux kernel will only write less than the entire buffer (a "short write") if there is an error (e.g. a signal causing EINTR). Thus, larger write buffers means the operating system needs to switch to the process less often. On the other head, "short reads", where a read() returns less than the maximum length, become increasingly common as the buffer size increases, which diminishes the benefit. There is a SO_RCVLOWAT socket option to change this that I did not test. The experiments were run on two 16 CPU Google Cloud T2D instances, which use AMD EPYC Milan processors (3rd generation, released in 2021). Each core is a real physical core. I used Ubuntu 23.04 running kernel 6.2.0-1005-gcp. My benchmark program is written in Rust and is available on Github. On localhost, Unix sockets were able to transfer data at approximately 9000 MiB/s. Localhost TCP sockets were a bit slower, around 7000 MiB/s. When using two separate cloud VMs with a networking throughput limit of 32 Gbps = 3800 MiB/s, I needed to use 6 TCP sockets to reliably reach that maximum throughput. A single TCP socket gets around 1400 MiB/s with 256 KiB buffers, with peaks as high as 2200 MiB/s. Experiment 1: /dev/zero and /dev/urandom My first experiment is reading from the /dev/zero and /dev/urandom devices. These are software devices implemented by the kernel, so they should have low overhead and low variability, since other tasks are not involved. Reading from /dev/urandom should be much slower than /dev/zero since the kernel must generate random bytes, rather than just zeros. The chart below shows the throughput for reading from /dev/zero as the buffer size is increased. The results show that the basic linear time per system call model holds until the system reaches maximum throughput (256 kiB buffer = 39000 MiB/s for /dev/zero, or 16 kiB = 410 MiB/s for /dev/urandom). As the buffer size increases further, the throughput decreases as the buffers get too big. This suggests that some other cost for larger buffers starts to outweigh the reduction in number of system calls. Perhaps CPU caches become less effective? The AMD EPYC Milan (3rd gen) CPU I tested on has 32 KiB of L1 data cache and 512 KiB of L2 data cache per core. The performance decreases don't exactly line up with these numbers, but it still seems plausible. The numbers for /dev/urandom are substantially lower, but otherwise similar. I did a linear least-squares fit on the average time per system call, shown in the following chart. If I use all the data, the fit is not good, because the trend changes for larger buffers. However, if I use the data up to the maximum throughput at 256 KiB, the fit is very good, as shown on the chart below. The linear fit models the minimum time per system call as 167 ns, with 0.0235 ns/byte additional time. If we want to use smaller buffers, using a 64 KiB buffer for reading from /dev/zero gets within 95% of the maximum throughput. Experiment 2: Unix and localhost TCP sockets Exchanging data with other processes is the thing I am actually interested in, so I tested Unix and TCP sockets on a single machine. In this case, I varied both the write buffer size and the read buffer size. Unfortunately, these results vary a lot. A more robust comparison would require running each experiment many times, and using some sort of statistical comparison. However, this "quick and dirty" experiment satisfied my curiousity, so I didn't do that. As a result, my conclusions here are vague. The simple model that increasing buffer size should decrease overhead is true, but only until the buffers are about 4 KiB. Above that point, the results start to be highly variable, and it is much harder to draw general conclusion. However, appears that increasing the write buffer size generally is quite helpful up to at least 256 KiB, and often needed as much as 1 MiB to get the highest localhost throughput. I suspect this is because on Linux with blocking sockets, write() will not return until it has written all the data in the buffer, unless there is an error (e.g. EINTR). As a result, passing a large buffer means the kernel can do a lot of the work without needing to switch back to user space. Unfortunately, the same is not true for read(), which often returns "short reads" with any data that is available in the buffer. This starts with buffer sizes around 2 KiB, with the percentage of short reads increasing as the buffer size gets larger. This means the simple model does not hold, because we aren't actually increasing the bytes per read call. I suspect this is a factor which means this microbenchmark is likely not representative of real programs. A real program will do something with the buffer, which will provide time for more data to be buffered in the kernel, and would probably decrease the number of short reads. This likely means larger buffers are in practice more useful than this microbenchmark suggests. As a result of this, the highest throughput often was achievable with small read buffers. I'm somewhat arbitrarily selecting 16 KiB at the best read buffer, and 256 KiB as the best write buffer, although a 1 MiB write buffer seems to be To give a sense of how variable the results are, the plot below shows the local Unix socket throughput for each read and write buffer throughput size. I apologize for the ugly plot. I did not want to spend the time to make it more beautiful. This plot is interactive so you can slice the data to the area of interest. I recommend zooming in to the left hand size with read buffers up to about 300 KiB. The first thing to note is at least on Linux with blocking sockets, the writer will almost never have a "short write", where the write system call returns before writing all the data in the buffer. Unless there is a signal (EINTR) or some other "error" condition, write() will not return until all the bytes are written. The same is not true for reads. The read() system call will often return a "short" read, starting around buffer sizes of 2 KiB. The percentage of short reads generally increases as buffer sizes get bigger, which is logical. Another note is that sockets have in-kernel send and receive buffers. I did not tune these at all. It is possible that better performance is possible by tuning these settings, but that was not my goal. I wanted to know what happens "out of the box" for general-purpose programs without any special tuning. Experiment 3: TCP between two hosts In this experiment, I used two separate hosts connected with 32 Gbps networking in Google Cloud. I first tested the TCP throughput using iperf, to independently verify the network performance. A single TCP connection with iperf is not enough to fully utilize the network. I tried fiddling with some command line options and with Kernel settings like net.ipv4.tcp_rmem and wasn't able to get much better than about 12 Gb/s = 1400 MiB/s. The throughput is also highly varible. Here is some example output with iperf reporting at 2 second intervals, where you can see the throughput ranging from 10 to 19 Gb/s, with an average over the entire interval of 12 Gb/s. To hit the maximum network throughput, I need to use 6 or more parallel TCP connections (iperf -c IP_ADDRESS --time 60 --interval 2 -l 262144 -P 6). Using 3 connections gets around 26 Gb/s, and using 4 or 5 will occasionally hit the maximum, but will also occasionally drop down. Using at least 6 seems to reliably stay at the maximum. Due to this variability, it is hard to draw any conclusions about buffer size. In particular: a single TCP connection is not limited by CPU. The system uses about 40% of a single CPU core, basically all in the kernel. This is more about how the buffer sizes may impact scheduling choices. That said, it is clear that you cannot hit the maximum throughput with a small write buffer. The experiments with 4 KiB write buffers reached approximately 300 MiB/s, while an 8 KiB write buffer was much faster, around 1400 MiB/s. Larger still generally seems better, up to around 256 KiB, which occasionally reached 2200 MiB/s = 17.6 Gb/s. The plot below shows the TCP socket throughput for each read and write buffer size. Again, I apologize for the ugly plot.

a year ago 62 votes

More in programming

Supa Pecha Kucha

slug: supapechakucha

18 hours ago 3 votes
The Power of Principles in Web Development Decision-Making (article)

Discover how The Epic Programming Principles can transform your web development decision-making, boost your career, and help you build better software.

9 hours ago 2 votes
Closing the borders alone won't fix the problems

Denmark has been reaping lots of delayed accolades from its relatively strict immigration policy lately. The Swedes and the Germans in particular are now eager to take inspiration from The Danish Model, given their predicaments. The very same countries that until recently condemned the lack of open-arms/open-border policies they would champion as Moral Superpowers.  But even in Denmark, thirty years after the public opposition to mass immigration started getting real political representation, the consequences of culturally-incompatible descendants from MENAPT continue to stress the high-trust societal model. Here are just three major cases that's been covered in the Danish media in 2025 alone: Danish public schools are increasingly struggling with violence and threats against students and teachers, primarily from descendants of MENAPT immigrants. In schools with 30% or more immigrants, violence is twice as prevalent. This is causing a flight to private schools from parents who can afford it (including some Syrians!). Some teachers are quitting the profession as a result, saying "the Quran run the class room". Danish women are increasingly feeling unsafe in the nightlife. The mayor of the country's third largest city, Odense, says he knows why: "It's groups of young men with an immigrant background that's causing it. We might as well be honest about that." But unfortunately, the only suggestion he had to deal with the problem was that "when [the women] meet these groups... they should take a big detour around them". A soccer club from the infamous ghetto area of Vollsmose got national attention because every other team in their league refused to play them. Due to the team's long history of violent assaults and death threats against opposing teams and referees. Bizarrely leading to the situation were the team got to the top of its division because they'd "win" every forfeited match. Problems of this sort have existed in Denmark for well over thirty years. So in a way, none of this should be surprising. But it actually is. Because it shows that long-term assimilation just isn't happening at a scale to tackle these problems. In fact, data shows the opposite: Descendants of MENAPT immigrants are more likely to be violent and troublesome than their parents. That's an explosive point because it blows up the thesis that time will solve these problems. Showing instead that it actually just makes it worse. And then what? This is particularly pertinent in the analysis of Sweden. After the "far right" party of the Swedish Democrats got into government, the new immigrant arrivals have plummeted. But unfortunately, the net share of immigrants is still increasing, in part because of family reunifications, and thus the problems continue. Meaning even if European countries "close the borders", they're still condemned to deal with the damning effects of maladjusted MENAPT immigrant descendants for decades to come. If the intervention stops there. There are no easy answers here. Obviously, if you're in a hole, you should stop digging. And Sweden has done just that. But just because you aren't compounding the problem doesn't mean you've found a way out. Denmark proves to be both a positive example of minimizing the digging while also a cautionary tale that the hole is still there.

19 hours ago 2 votes
We all lose when art is anonymised

One rabbit hole I can never resist going down is finding the original creator of a piece of art. This sounds simple, but it’s often quite difficult. The Internet is a maze of social media accounts that only exist to repost other people’s art, usually with minimal or non-existent attribution. A popular image spawns a thousand copies, each a little further from the original. Signatures get cropped, creators’ names vanish, and we’re left with meaningless phrases like “no copyright intended”, as if that magically absolves someone of artistic theft. Why do I do this? I’ve always been a bit obsessive, a bit completionist. I’ve worked in cultural heritage for eight years, which has made me more aware of copyright and more curious about provenance. And it’s satisfying to know I’ve found the original source, that I can’t dig any further. This takes time. It’s digital detective work, using tools like Google Lens and TinEye, and it’s not always easy or possible. Sometimes the original pops straight to the top, but other times it takes a lot of digging to find the source of an image. So many of us have become accustomed to art as an endless, anonymous stream of “content”. A beautiful image appears in our feed, we give it a quick heart, and scroll on, with no thought for the human who sweated blood and tears to create it. That original artist feels distant, disconected. Whatever benefit they might get from the “exposure” of your work going viral, they don’t get any if their name has been removed first. I came across two examples recently that remind me it’s not just artists who miss out – it’s everyone who enjoys art. I saw a photo of some traffic lights on Tumblr. I love their misty, nighttime aesthetic, the way the bright colours of the lights cut through the fog, the totality of the surrounding darkness. But there was no name – somebody had just uploaded the image to their Tumblr page, it was reblogged a bunch of times, and then it appeared on my dashboard. Who took it? I used Google Lens to find the original photographer: Lucas Zimmerman. Then I discovered it was part of a series. And there was a sequel. I found interviews. Context. Related work. I found all this cool stuff, but only because I knew Lucas’s name. Traffic Lights, by Lucas Zimmerman. Published on Behance.net under a CC BY‑NC 4.0 license, and reposted here in accordance with that license. The second example was a silent video of somebody making tiny chess pieces, just captioned “wow”. It was clearly an edit of another video, with fast-paced cuts to make it accommodate a short attention span – and again with no attribution. This was a little harder to find – I had to search several frames in Google Lens before I found a summary on a Russian website, which had a link to a YouTube video by metalworker and woodworker Левша (Levsha). This video is four times longer than the cut-up version I found, in higher resolution, and with commentary from the original creator. I don’t speak Russian, but YouTube has auto-translated subtitles. Now I know how this amazing set was made, and I have a much better understanding of the materials and techniques involved. (This includes the delightful name Wenge wood, which I’d never heard before.) https://youtube.com/watch?v=QoKdDK3y-mQ A piece of art is more than just a single image or video. It’s a process, a human story. When art is detached from its context and creator, we lose something fundamental. Creators lose the chance to benefit from their work, and we lose the opportunity to engage with it in a deeper way. We can’t learn how it was made, find their other work, or discover how to make similar art for ourselves. The Internet has done many wonderful things for art, but it’s also a machine for endless copyright infringement. It’s not just about generative AI and content scraping – those are serious issues, but this problem existed long before any of us had heard of ChatGPT. It’s a thousand tiny paper cuts. How many of us have used an image from the Internet because it showed up in a search, without a second thought for its creator? When Google Images says “images may be subject to copyright”, how many of us have really thought about what that means? Next time you want to use an image from the web, look to see if it’s shared under a license that allows reuse, and make sure you include the appropriate attribution – and if not, look for a different image. Finding the original creator is hard, sometimes impossible. The Internet is full of shadows: copies of things that went offline years ago. But when I succeed, it feels worth the effort – both for the original artist and myself. When I read a book or watch a TV show, the credits guide me to the artists, and I can appreciate both them and the rest of their work. I wish the Internet was more like that. I wish the platforms we rely on put more emphasis on credit and attribution, and the people behind art. The next time an image catches your eye, take a moment. Who made this? What does it mean? What’s their story? [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 1 votes
Apple does AI as Microsoft did mobile

When the iPhone first appeared in 2007, Microsoft was sitting pretty with their mobile strategy. They'd been early to the market with Windows CE, they were fast-following the iPod with their Zune. They also had the dominant operating system, the dominant office package, and control of the enterprise. The future on mobile must have looked so bright! But of course now, we know it wasn't. Steve Ballmer infamously dismissed the iPhone with a chuckle, as he believed all of Microsoft's past glory would guarantee them mobile victory. He wasn't worried at all. He clearly should have been! After reliving that Ballmer moment, it's uncanny to watch this CNBC interview from one year ago with Johny Srouji and John Ternus from Apple on their AI strategy. Ternus even repeats the chuckle!! Exuding the same delusional confidence that lost Ballmer's Microsoft any serious part in the mobile game.  But somehow, Apple's problems with AI seem even more dire. Because there's apparently no one steering the ship. Apple has been promising customers a bag of vaporware since last fall, and they're nowhere close to being able to deliver on the shiny concept demos. The ones that were going to make Apple Intelligence worthy of its name, and not just terrible image generation that is years behind the state of the art. Nobody at Apple seems able or courageous enough to face the music: Apple Intelligence sucks. Siri sucks. None of the vaporware is anywhere close to happening. Yet as late as last week, you have Cook promoting the new MacBook Air with "Apple Intelligence". Yikes. This is partly down to the org chart. John Giannandrea is Apple's VP of ML/AI, and he reports directly to Tim Cook. He's been in the seat since 2018. But Cook evidently does not have the product savvy to be able to tell bullshit from benefit, so he keeps giving Giannandrea more rope. Now the fella has hung Apple's reputation on vaporware, promised all iPhone 16 customers something magical that just won't happen, and even spec-bumped all their devices with more RAM for nothing but diminished margins. Ouch. This is what regression to the mean looks like. This is what fiefdom management looks like. This is what having a company run by a logistics guy looks like. Apple needs a leadership reboot, stat. That asterisk is a stain.

2 days ago 3 votes