More from Posts on Made of Bugs
About a month ago, the CPython project merged a new implementation strategy for their bytecode interpreter. The initial headline results were very impressive, showing a 10-15% performance improvement on average across a wide range of benchmarks across a variety of platforms. Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.
Earlier this month, I used Claude to port (parts of) an Emacs package into Rust, shrinking the execution time by a factor of 1000 or more (in one concrete case: from 90s to about 15ms). This is a variety of yak-shave that I do somewhat routinely, both professionally and in service of my personal computing environment. However, this time, Claude was able to execute substantially the entire project under my supervision without me writing almost-any lines of code, speeding up the project substantially compared to doing it by hand.
Suppose we have a large collection of documents, and we wish you identify which documents are approximately the same as each other. For instance, we may have crawled the web over some period of time, and expect to have fetched the “same page” several times, but to see slight differences in metadata, or that we have several revisions of a page following small edits. In this post I want to explore the method of approximate deduplication via Jaccard similarity and the MinHash approximation trick.
I worked at Stripe for about seven years, from 2012 to 2019. Over that time, I used and contributed to many generations of Stripe’s developer environment – the tools that engineers used daily to write and test code. I think Stripe did a pretty good job designing and building that developer experience, and since leaving, I’ve found myself repeatedly describing features of that environment to friends and colleagues. This post is an attempt to record the salient features of that environment as I remember it.
I was recently introduced to the paper “Seeing the Invisible: Perceptual-Cognitive Aspects of Expertise” by Gary Klein and Robert Hoffman. It’s excellent and I recommend you read it when you have a chance. Klein and Hoffman discuss the ability of experts to “see what is not there”: in addition to observing data and cues that are present in the environment, experts perceive implications of these cues, such as the absence of expected or “typical” information, the typicality or atypicality of observed data, and likely/possible past and future time trajectories of a system based on a point-in-time snapshot or limited duration of observation.
More in technology
A long time ago I wrote about secret government telephone numbers, and before that, secret military telephone buttons. I suppose this is becoming a series. To be clear, the "secret" here is a joke, but more charitably I could say that it refers to obscurity rather than any real effort to keep them secret. Actually, today's examples really make this point: they're specifically intended to be well known, but are still pretty obscure in practice. If you've been around for a while, you know how much I love telephone numbers. Here in North America, we have a system called the North American Numbering Plan (NANP) that has rigidly standardized telephone dialing practices since the middle of the 20th century. The US, Canada, and a number of Central American countries benefit from a very orderly system of area codes (more formally numbering plan areas or NPAs) followed by a subscriber number written in the format NXX-XXXX (this is a largely NANP-centric notation for describing phone number patterns, N represents the digits 2-9 and X any digit). All of these NANP numbers reside under the country code 1, allowing at least theoretically seamless international dialing within the NANP community. It's really a pretty elegant system. NANP is the way it is for many reasons, but it mostly reflects technical requirements of the telephone exchanges of the 1940s. This is more thoroughly explained in the link above, but one of the goals of NANP is to ensure that step-by-step (SxS) exchanges can process phone numbers digit by digit as they are dialed. In other words, it needs to be possible to navigate the decision tree of telephone routing using only the digits dialed so far. Readers with a computer science education might have some tidy way to describe this in terms of Chompsky or something, but I do not have a computer science education; I have an Information Technology education. That means I prefer flow charts to automata, and we can visualize a basic SxS exchange as a big tree. When you pick up your phone, you start at the root of the tree, and each digit dialed chooses the edge to follow. Eventually you get to a leaf that is hopefully someone's telephone, but at no point in the process does any node benefit from the context of digits you dial before, after, or how many total digits you dial. This creates all kinds of practical constraints, and is the reason, for example, that we tend to write ten-digit phone numbers with a "1" before them. That requirement was in some ways long-lived (The last SxS exchange on the public telephone network was retired in 1999), and in other ways not so long lived... "common control" telephone exchanges, which did store the entire number in electromechanical memory before making a routing decision, were already in use by the time the NANP scheme was adopted. They just weren't universal, and a common nationwide numbering scheme had to be designed to accommodate the lowest common denominator. This discussion so far is all applicable to the land-line telephone. There is a whole telephone network that is, these days, almost completely separate but interconnected: cellular phones. Early cellular phones (where "early" extends into CDMA and early GSM deployments) were much more closely attached to the "POTS" (Plain Old Telephone System). AT&T and Verizon both operated traditional telephone exchanges, for example 5ESS, that routed calls to and from their customers. These telephone exchanges have become increasingly irrelevant to mobile telephony, and you won't find a T-Mobile ESS or DMS anywhere. All US cellular carriers have adopted the GSM technology stack, and GSM has its own definition of the switching element that can be, and often is, fulfilled by an AWS EC2 instance running RHEL 8. Calls between cell phones today, even between different carriers, are often connected completely over IP and never touch a traditional telephone exchange. The point is that not only is telephone number parsing less constrained on today's telephone network, in the case of cellular phones, it is outright required to be more flexible. GSM also defines the properties of phone numbers, and it is a very loose definition. Keep in mind that GSM is deeply European, and was built from the start to accommodate the wide variety of dialing practices found in Europe. This manifests in ways big and small; one of the notable small ways is that the European emergency number 112 works just as well as 911 on US cell phones because GSM dictates special handling for emergency numbers and dictates that 112 is one of those numbers. In fact, the definition of an "emergency call" on modern GSM networks is requesting a SIP URI of "urn:service:sos". This reveals that dialed number handling on cellular networks is fundamentally different. When you dial a number on your cellular phone, the phone collects the entire number and then applies a series of rules to determine what to do, often leading to a GSM call setup process where the entire number, along with various flags, is sent to the network. This is all software-defined. In the immortal words of our present predicament, "everything's computer." The bottom line is that, within certain regulatory boundaries and requirements set by GSM, cellular carriers can do pretty much whatever they want with phone numbers. Obviously numbers need to be NANP-compliant to be carried by the POTS, but many modern cellular calls aren't carried by the POTS, they are completed entirely within cellular carrier systems through their own interconnection agreements. This freedom allows all kinds of things like "HD voice" (cellular calls connected without the narrow filtering and companding used by the traditional network), and a lot of flexibility in dialing. Most people already know about some weird cellular phone numbers. For example, you can dial *#06# to display your phone's various serial numbers. This is an example of a GSM MMI (man-machine interface) code, phone numbers that are handled entirely within your device but nonetheless defined as dialable numbers by GSM for compatibility with even the most basic flip phones. GSM also defined numbers called USSD for unstructured supplementary service data, which set up connections to the network that can be used in any arbitrary way the network pleases. Older prepaid phone services used to implement balance check and top-up operations using USSD numbers, and they're also often used in ways similar to Vertical Service Codes (VSCs) on the landline network to control carrier features. USSDs also enabled the first forms of mobile data, which involved a "special telephone call" to a USSD in order to download a cut-down form of ESPN in a weird mobile-specific markup language. Now, put yourself in the shoes of an enterprising cellular network. The flexibility of processing phone numbers as you please opens up all kinds of possibilities. Innovative services! Customer convenience! Sell them for money! Oh my god, sell them for money! It seems like this started with customer service. It is an old practice, dating to the Bell operating companies, to have special short phone numbers to reach the telephone company itself. The details varied by company (often based on technical constraints in their switching system), but a common early setup was that dialing 114 got you the repair service operator to report a problem with your phone line. These numbers were usually listed in the front of the phone book, and for the phone company the fact that they were "special" or nonstandard was sort of a feature, since they could ensure that they were always routed within the same switch. The selection of "911" as the US emergency number seems rooted in this practice, as later on several major telcos used the "N11" numbers for their service lines. This became immortalized in the form of 611, which will get you customer service for most phone carriers. So cellular companies did the same, allocating themselves "special" numbers for various service lines. Verizon offers #PMT to make a payment. Naturally, there's also room for upsell services: #ROAD for roadside assistance on Verizon. The odd thing about these phone numbers is that there's really no standard involved, they're just the arbitrary practices of specific cellular companies. The term "mobile dial code" (MDC) is usually used to refer to them, although that term seems to have arisen organically rather than by intent. Remember, these aren't a real thing! The carriers just make them up, all on their own. The only real constraint on MDCs is that they need to not collide with any POTS number, which is most easily achieved by prefixing them with some combination of * and #, and usually not "*#" because it's referenced by the GSM standard for MMI. MDCs are available for purchase, but the terms don't seem to be public and you have to negotiate separately with each carrier. That's because there is no centralization. This is where MDCs stand in clear contrast to the better known SMS Short Code, or SMSSC. Those are the five or six-digit numbers widely used in advertising campaigns. SMSSCs are centrally managed by the SMS Short Code Registry, which is a function of industry association CTIA but contracted to iConectiv. iConectiv is sort of like the SAIC of the communications industry, a huge company that dates back to the Bell System (where it became Bellcore after divestiture) and that no one has heard of but nonetheless is a critically important part of the telephone system. Providers that want to have an SMSSC (typically on behalf of one of their customers) pay a fee, and usually recoup it from the end user. That fee is not cheap, typical end-user rates for an SMSSC run over $10k a year. But at least it's straightforward, and your SMS A2P or marketing company can make it happen for you. MDCs have no such centralization, no standardized registration process. You negotiate with each carrier individually. That means it's pretty difficult to put together "complete coverage" on an MDC by getting the same one assigned by every major carrier. And this is one of those areas where "good enough" is seldom good enough; people get pissed off when something you advertise doesn't work. Putting a phone number that only works for some people on a billboard can quickly turn into an expensive embarrassment, so companies will be wary of using an MDC in marketing if they don't feel really confident that it works for the vast majority of cellphone users. Because of this fragmentation, adoption of MDCs for marketing purposes has been very low. The only going concern I know of is #250, operated by a company called Mobile Direct Response. The premise of #250 is very simple: users call 250 and are greeted by a simple IVR. They say a keyword, and they're either forwarded to the phone number of the business that paid for the keyword or they receive a text message response with more information. #250 is specifically oriented towards radio advertising, where asking people to remember a ten-digit phone number is, well, asking a lot. It's also made the jump to podcast advertising. #250 is priced in a very radio-centric way, by the keyword and the size of the market area in which the advertisement that gives the keyword is played. 250 was founded by Dave Robinett, who used to work on marketing at Sprint, presumably where he became aware that these MDCs were a possibility. He has negotiated for #250 to work across a substantial list of cellular carriers in the US and Canada, providing almost complete coverage. That wasn't easy, Robinett said in an interview that it took five years to get AT&T, T-Mobile, Verizon, and Sprint on board. 250 does not appear to be especially widely used. For one, the website is a little junky, with some broken links and other indications that it is not backed by a large communications department. Dave Robinett may be the entire company. They've been operating since at least 2017, and I've only ever heard it in an ad once---a podcast ad that ended with "Call #250 and say I need a dentist." One thing you quickly notice when you look into telephone marketing is that dentists are apparently about 80% of the market. He does mention success with shows like "Rush, Hannity, and Levin," so it's safe to say that my radio habits are a little different from Robinett's. That's not to say that #250 is a failure. In the same interview Robinett says that the company pays his mortgage and, well, that ain't too bad. But it's also nothing like the widespread adoption of SMSSCs. One wonders if the limitation of MDCs to one company that is so focused on radio marketing limits their potential. It might really open things up if some company created a registration service, and prenegotiated terms with carriers so that companies could pick up their own MDCs to use as they please. Well, yeah, someone's trying. Around 2006, a recently-founded mobile marketing company called Zoove announced StarStar dialing. I'm a little unclear on Zoove's history. It seems that they were originally founded as Teleractive in Rhode Island as an SMS short code keyword response service, and after an infusion of VC cash moved to Palo Alto and started looking for something bigger. In 2016, they were acquired by a call center technology company called Mindful. Or maybe Zoove sold the StarStar business to Mindful? Stick a pin in that. I don't love the name StarStar, which has shades of Spacestar Ordering. But it refers to their chosen MDC prefix, two stars. Well, that point is a little odd, according to their marketing material you can also get numbers with a # prefix or * prefix, but all of the examples use **. I would say that, in general, StarStar has it a little less together than #250. Their website is kind of broken, it only loads intermittently and some of the images are missing. At one point it uses the term "CADC" to describe these numbers but I can't find that expanded anywhere. Plus the "About" page refers repeatedly to Virtual Hold Technologies, which renamed to VHT in 2018 and Mindful 2022. It really feels like the vestigial website of a dead company. I know about StarStar because, for a time, trucks from moving franchise All My Sons prominently bore the number MOVE on the side. Indeed, this is still one of the headline examples on the StarStar website, but it doesn't work. I just get a loud click and then the call ends. And it's not that StarStar doesn't work with my mobile carrier, because StarStar's own number MOBILE does connect to their IVR. That IVR promises that a representative will speak with me shortly, plays about five seconds of hold music, and then dumps me on a voicemail system. Despite StarStar numbers apparently basically working, I'm finding that most of the examples they give on their website won't even connect. Perhaps results will vary depending on the mobile network. Well, perhaps not that much is lost. StarStar was founded by Steve Doumar, a serial telephone marketing entrepreneur with a colorful past founding various inbound call center companies. Perhaps his most famous venture is R360, a "lead acquisition" service memorialized by headlines like "Drug treatment referral service took advantage of addictions to make a quick buck" from the Federal Trade Commission. He's one of those guys whose bio involves founding a new company every two years, which he has to spin as entrepreneurial dynamism rather than some combination of fleeing dissatisfied investors and fleeing angered regulators. Today he runs whisp.io, a "customer activation platform" that appears to be a glorified SMS advertising service featuring something ominously called "simplified opt-in." Whisp has a YouTube channel which features the 48-second gem "Fun Fact We Absolutely Love About Steve Doumar". Description: Our very own CEO, Steve Doumar is a kind and generous person who has given back to the community in many ways; this man is absolutely a man with a heart of gold. Do you want to know the fun fact? Yes you do! Here it is: "He is an incredible philanthropist. He loves helping other people. Every time I'm with him he comes up with new ways and new ideas to help other people. Which I think is amazing. And he doesn't brag about it, he doesn't talk about it a lot." Except he's got his CMO making a YouTube video about it? From Steve Doumar's blog: American entrepreneur Ray Kroc expressed the importance of persisting in a busy world where everyone wants a bite of success. This man is no exception. An entrepreneur. A family man. A visionary. These are the many names of a man that has made it possible for opt-ins to be safe, secure, and accurate; Steve Doumar. I love this stuff, you just can't make it up. I'm pretty sure what's going on here is just an SEO effort to outrank the FTC releases and other articles about the R360 case when you search for his name. It's only partially working, "FTC Hits R360 and its Owner With $3.8 Million Civil ..." still comes in at Google result #4 for "Steve Doumar," at least for me. But hey, #4 is better than #1. Well, to be fair to StarStar, I don't think Steve Doumar has been involved for some years, but also to be fair, some of their current situation clearly dates to past behavior that is maybe less than savory. Zoove originally styled itself as "The National StarStar Registry," clearly trying to draw parallels to CTIA/iConectiv's SMSSC registry. Their largest customer was evidently a company called Sumotext, which leased a number of StarStar numbers to offer an SMS and telephone marketing service. In 2016, Sumotext sued StarStar, Zoove, VHT (now Mindful), and a healthy list of other entities all involved in StarStar including the intriguingly named StarSteve LLC. I'm not alone in finding the corporate history a little baffling; in a footnote on one ruling the court expressed confusion about all the different names and opted to call them all Zoove. In any case, Sumotext alleged that Zoove, StarSteve, and VHT all merged as part of a scheme to illegally monopolize the StarStar market by undercutting the companies that had been leasing the numbers and effectively giving VHT (Mindful) an exclusive ability to offer marketing services with StarStar numbers. The case didn't end up going anywhere for Sumotext, the jury found that Sumotext hadn't established a relevant market which is a key part of a Sherman act case. An appeal was made all the way to the Supreme Court, but they didn't take it up. What the case did do was publicize some pretty sketchy sounding details, like the seemingly uncontested accusation that VHT got Sumotext's customer list from the registry database and used it to convert them all into StarSteve customers. And yes, the Steve in StarSteve is Steve Doumar. As best I can tell, the story here is that Steve Doumar founded Zoove (or bought Teleractive and renamed it or something?) to establish the National StarStar Registry, then founded a marketing company called StarSteve that resold StarStar numbers, then merged StarSteve and the National StarStar Registry together and cut off all of the other resellers. Apparently not a Sherman act violation but it sure is a bad look, and I wonder how much it contributed to the lack of adoption of the whole StarStar idea---especially given that Sumotext seems to have been responsible for most of that adoption, including the All My Sons deal for MOVE. I wonder if All My Sons had to take MOVE off of their trucks because of the whole StarSteve maneuver? That seems to be what happened. Look, ten-digit phone numbers are had to remember, that much is true. But as is, the "MDC" industry doesn't seem stable enough for advertising applications where the number needs to continue to work into the future. I think the #250 service is probably here to stay, but confined to the niche of audio advertising. StarStar raised at least $30 million in capital in the 2010s, but seems to have shot itself in the foot. StarStar owner VHT/Mindful, now acquired by Medallia, doesn't even mention StarStar as a product offering. Hey, remember how Steve Doumar is such a great philanthropist? There are a lot of vestiges around of StarStar Inc., a nonprofit that made StarStar numbers available to charitable organizations. Their website, starstar.org, is now a Wix error page. You can find old articles about StarStar Me, also written **me, which sounds lewd but was a $3/mo offering that allowed customers to get a vanity short code (such as ** followed by their name)---the original form of StarStar, dating back to 2012 and the beginning of Zoove. In a press release announcing the StarStar Me, Zoove CEO Joe Gillespie said: With two-thirds of smartphone users having downloaded social networking apps to their phones, there’s a rapidly growing trend in today's on-the-go lifestyle to extend our personal communications and identity into the digital realm via our mobile phones. And somehow this leads to paying $3 for to get StarStarred? I love it! It's so meaningless! And years later it would be StarStar Mobile formerly Zoove by VHT now known as Mindful a Medallia company. Truly an inspiring story of industry, and just one little corner of the vast tapestry of phone numbers.
It is a story as old as time (or at least the 1960s): kid gets an RC car for Christmas and excitedly takes it for spin, but crashes it into a wall within five minutes and tears ensue. The automotive industry has cut down on accidents by implementing automatic emergency braking safety features, so why […] The post This automatic emergency braking system protects RC cars appeared first on Arduino Blog.
The shortest distance between your thoughts and the printed word.
It isn’t a secret that many kids find math to be boring and it is easy for them to develop an attitude of “when am I ever going to use this?” But math is incredibly useful in the real world, from blue-collar machinists using trigonometry to quantum physicists unveiling the secrets of our universe through […] The post This unique electronic toy helps children learn their shapes appeared first on Arduino Blog.