Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
106
The ThinkPad T430 has a few options for running it with an external display: VGA port, which is pretty much obsolete at this point mini DisplayPort connector on the laptop itself DVI or DisplayPort on a dock The mini DisplayPort port has annoyed me for as long as I’ve had this machine. Most places where I’ve had to present something only offer an HDMI cable, which means that I always have to carry a dongle around, and I keep forgetting to bring one everywhere I happen to go. Until now. I have a few of these SATA HDD adapters that replace the optical drive on the ThinkPad T430, and I discovered that my mini DisplayPort to HDMI adapter can fit in one without a problem.[^1] I don’t currently have a need for a second SSD in my T430, so this “mod” makes perfect sense to me. I bet there is someone out there who is capable of routing an actual HDMI port in place of this adapter. The existence of FHD mods for this laptop suggests that this is possible. If you’re that person and created that...
10 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from ./techtipsy

The coffee machine ran out of memory

After looking into an incident involving Kubernetes nodes running out of memory, I took a trip to the office kitchen to take a break and get a cup of the good stuff. My teammate got their drink first, and then it was my turn. Why is there a Windows 98 themed pop-up on the screen? I wanted to get my coffee, so I tapped on the small OK button. That may have forced the poor coffee machine to start swapping, for which I felt a little bit guilty. The UI was catching up with previous animations, and I got to the drink selection. None of the buttons worked. I reckon something critical crashed in the background. After looking into an incident involving a coffee machine running out of memory, I took a trip to the other office kitchen to take a break and get a cup of the good stuff. That one was fine. Guess it ran on something else than Java. laugh_track.mp3

2 weeks ago 15 votes
About the time I trashed my mother's laptop

Around 2003, my mother had a laptop: the Compaq Armada 1592DT. It ran Windows Me, the worst Windows to ever exist, whopping 96 MB of RAM, and a 3 GB hard drive. My mother used it for important stuff, and I played games on it. Given the limitations of the 3 GB hard drive, this soon lead to a conflict: there was no room to store any new games! I did my best to make additional room by running the disk cleaner utility, disabling unnecessary Windows features and deleting some PDF catalogues that my mother had downloaded, but there was still a constant lack of space. Armed with a lack of knowledge about computers, I went further and found a tool that promised to make more room on the hard drive. I can’t remember what it was, but it had a nice graphical user interface where the space on the drive was represented as a pie chart. To my amazement, I could slide that pie chart to make it so that 90% of the drive was free space! I went full speed ahead with it. What followed was a crash and upon rebooting I was presented with a black screen. Oops. My mother ended up taking it to a repair shop for 1200 EEK, which was a lot of money at the time. The repair shop ended up installing Windows 98 SE on it, which felt like a downgrade at the time, but in retrospect it was an improvement over Windows Me. I had no idea what I was doing at the time, but I assume that the tool I was playing with was some sort of a partition manager that had no safeguards in place to avoid shrinking and reformatting operating system partitions. Or if it did, then it made ignoring the big warning signs way too easy. Still 100% user error on my part. If only I knew that reinstalling Windows was a relatively simple operation at the time, but it took a solid 4-5 years until I did my first installation of Windows all by myself.

a month ago 21 votes
Fairphone Fairbuds XL review: admirable goals, awful product

I bought the Fairphone Fairbuds XL with my own money at a recent sale for 186.75 EUR, plus 15 EUR for shipping to Estonia. The normal price for these headphones is 239 EUR. This post is not sponsored. I admire what Fairphone wants to achieve, even going as far as getting the Fairphone 5 as a replacement for my iPhone X. Failing to repair my current headphones, I went ahead and decided to get the Fairphone Fairbuds XL as they also advertise the active noise-cancelling feature, and I like the Fairphone brand. Disclaimer: this review is going to be entirely subjective and based on my opinions and experiences with other audio products in the past. I also have tinnitus.1 I consulted rtings.com review before purchasing the product to get an idea about what to expect as a consumer. The comparison headphones The main point of comparison for this review is going to be the Sony WH-1000XM3, which are premium high-end wireless Bluetooth headphones, with active noise-cancelling (before that feature broke). These headphones retailed at a higher price during 2020 (about 300-400 EUR) so they are technically a tier above the Fairbuds XL, but given that its successsor, the WH-1000XM4, can be bought for 239 EUR new (and often about 200-ish EUR on sale!), then it is a fair comparison in my view. After I replaced the ear cushions on my Sony WH-1000XM3 headset, the active noise-cancelling feature started being flaky (popping and loud noises occurring with NC on). No amount of cleaning or calibrating fixed it, and even the authorized repair shop could not do anything about it. I diagnosed the issue to be with the internal noise-cancelling microphones and found that these failing is a very common issue for these headsets, even for newer versions of it. I am unable to compare the active noise-cancelling performance side-by-side, but I can say that the NC performance on the Sony WH-1000XM3 was simply excellent when it did work, no doubt about it. The Fairphone shop experience The first issue I had with the product was actually buying it. For some reason, the form would not accept my legal name which has letter “Õ” in it, a common vowel in Estonia. Knowing how poorly Javascript-based client-side validation can be built, I pulled a pro gamer move and copy-pasted my name into the form, which bypassed the faulty check altogether. Similar issue occurred with the address field, as we also have the letter “Ä” ( and “Ö”, “Ü”, for that matter). The name I can understand why Fairphone went with the name “Fairbuds XL”, it kind of made sense in their audio product line, and Apple set a precedent with AirPods Max. However, there is such a big missed opportunity here: they could’ve called the product… Fairphones. Yes, it would cause some confusion about their other product line, which is the Fairphone, but at least I would find the name more amusing. Packaging The packaging for the headphones is quite similar to what you’d get with the Fairphone 5: lots of cardboard and seemingly no plastic or otherwise problematic materials. Aside from the headphones themselves, you also get a nice egg bag, meant to protect your headphones when travelling with them. It’s okay, but nothing special, and it won’t protect your headphones from physical damage should they fall or get thrown around in a backpack. The Sony headphones come with a solid hardcase, which have done a fantastic job of protecting the headphones over the last 4 years. Longevity of a device depends both on repairability and durability, which is why a hard case would benefit the Fairbuds XL a lot. Factory defect My experience with the Fairbuds XL were off to a rocky start. I noticed that the USB-C cable that connects both sides of the headphones was inserted incorrectly. The headphones worked fine, but you could feel the flat USB-C cable being twisted inside the headband. The fix to this was to carefully push the headband back, disconnect the USB-C cable from the headphones, flip the cable around and reconnect it. Not a good first impression, but at least the fix was simple enough. Fit and feel The Fairbuds XL are not as comfortable as the reference headphones. The ear cushions and headrest are quite hard and not as soft as on the Sony WH-1000XM3. If you get the fit just right, then you probably won’t have issues with wearing these for a few hours at the time, but I found myself adjusting these often to stop them from hurting my ears and head even during a short test. The ear cups lack any kind of swiveling, which is likely contributing to the comparatively poor fit. Our ears are angled ever-so-slightly forwards, and the Sony WH-1000XM3 feels so much better on the ears as a result of its swiveling aspect. I also noticed that you can hear some components inside the headphones rattling when moving your head. This noise is very noticeable even during music playback and you don’t need to move your head a lot to hear that rattling. In my view, this is a serious defect in the product. When the headphones are folded in, the USB-C cable gets bent in the process and gets forced against one of the ear cushions. I suspect that within months or years of use, either the cable will fail or the ear cushion gets a permanent imprint of the USB-C cable position. The sound I’m not impressed with the sound that the Fairbuds XL produce. They are not in the same class as the Sony WH-1000XM3, with the default equalizer sounding incredibly bland. Most instruments and sounds are bland and not as clear. That’s the best I can describe it as. The Fairbuds app can be used to tune the sound via the equalizer, and out of all the presets I’ve found “Boston” to be the most pleasant one to use. Unfortunately the UI does not show how the presets customize the values in the equalizer, which makes tweaking a preset all that much harder. Compared to the Sony WH-1000XM3, I miss the cripsy sound and the all-encompassing bass, it can really bring all the satisfying details out. Given that I had used the Sony headphones for almost 5 years at this point may also just mean that I had gotten used to how it sounds. Active noise-cancelling The active noise-cancelling performance is nowhere near the Sony WH-1000XM3-s. The effect is very minor, and you’ll be hearing most of the surrounding sounds. Touching the active noise-cancelling microphones on the sides of the headphones will also make a loud sound inside the speaker, and walking around in a room will result in the headphones making wind noises. Because of this, I consider the active noise-cancelling functionality to be functionally broken. Microphone quality I used the Fairbuds XL in a work call, and based on feedback from other attendees, the microphone quality over Bluetooth can be categorized as barely passable, getting a solid 2 points out of 5. To be fair, Bluetooth microphone quality is also not great on the Sony WH-1000XM3-s, but compared to the Fairphone Fairbuds XL, they are still subjectively better. Fairbuds app The Fairbuds app is very simple, and you’d mainly want to use it for setting the equalizer settings and upgrading the firmware. The rest of the functionality seems to be a bunch of links to Fairphone articles and guides. The first time I installed the app, it told me that a firmware upgrade version V90 is available. During the first attempt, the progress bar stopped. Second attempt: it almost reached the end and did not complain about a firmware upgrade being available after that. Third attempt came after I had reinstalled the app. And there it was, the version V90 update, again. This time it got stuck at 1%. I’m probably still on the older version of the firmware, but I honestly can’t tell. Bluetooth multi-device connecting This is a feature that I didn’t know I needed in my life. With the reference Sony WH-1000XM3-s, whenever I wanted to switch where I listen to music from, I had to disconnect from my phone and then reconnect on the desktop, which was an annoying and manual process. With the Fairbuds XL, I can connect the headphones to both my laptop and phone and play media wherever, the headphones will switch to whichever device I’m actually using! This, too, has its quirks, and there might be a small delay when playing media on the other device, but I’ve grown so accustomed to using this feature now and can’t imagine myself going back to using anything else. This feature is not unique to the Fairbuds XL as other modern wireless headphones are also likely to boast this feature, but this is the first time I’ve had the opportunity to try this out myself. It’s a tremendous quality of life improvement for me. However, this, too, is not perfect. If I have the headphones connected to my phone and laptop, and I change to headset mode on the laptop for a meeting, then the playback on the phone will be butchered until I completely disconnect the headphones from the laptop. This seems like a firmware issue to me. The controls The Fairbuds XL has one button and one joystick. The button controls the active noise-cancelling settings (NC on, Ambient sound, NC off), plus the Bluetooth pairing mode. The joystick is used to turn the device on, switch songs and control the volume, and likely some other settings that relate to accepting calls and the like. Coming from the Sony WH-1000XM3, I have to say that I absolutely LOVE having physical buttons again! It’s so much easier to change the volume level, skip songs and start/stop playback with a physical button compared to the asinine touch surface solution that Sony has going on. The joystick is not perfect, skipping a song can be a little bit tricky due to how the joystick is positioned, you can’t always get a good handle due to your fingers hitting the rest of the headphone assembly. That’s the only concern I have with it. If the joystick was a little bit concave and larger, then that may make some of these actions easier for those of us with modest/large thumbs. The audio cue for skipping songs is a bit annoying and cannot seemingly be disabled. The sound effect resembles someone hitting a golf ball with a very poor driver. The ANC settings button is alright, but it’s not possible to quickly cycle between the three modes, you will have to fully listen to the nice lady speaking and then you can move on to the next setting. I wish that clicking the button in rapid succession would skip through the modes faster. USB-C port functionality I was curious to see if the Fairbuds XL worked as normal headphones if I just connected them up to my PC using a USB-C cable. To my surprise, they did! The audio quality was not as good as with Bluetooth, and the volume controls depended on which virtual device you select in your operating system. The Sony WH-1000XM3 do not work like this, the USB-C port is for charging only as far as I’ve tested, but it does have an actual 3.5mm port for wired use. When connected over Bluetooth and you connect a charging cable, the Fairbuds XL will pause momentarily and then continue playback while charging the battery. This is incredibly handy for a wireless device, especially in situations where you have an important meeting coming up and you’re just about to run out of battery. The Sony WH-1000XM3 will simply power off when you connect a charger cable, rendering them unusable while charging. Annoying issues For some reason, whenever I charge my Fairbuds XL, they magically turn on again and I have to shut them off a second time. I’m never quite sure if I’ve managed to shut the headphones off. It does the jingle that indicates that it’s powered off, but then I come back to it later and I find that they’re powered on again. Customer care experience I was so unhappy with the product that I tried out the refunding process for the Fairphone Fairbuds XL. I ordered the Fairbuds XL on 2025-02-10 and I received them on 2025-02-14, shipped to Estonia. According to Fairphone’s own materials, I can return the headphones without any questions asked, assuming that my use of them matches what can be done at a physical store. For Fairphone Products, including gift cards, you purchased on the Fairphone Webshop, you have a legal right to change your mind within 14 days and receive a refund amounting to the purchase price of the products and the costs of delivery and return. You are entitled to cancel your purchase within fourteen (14) days from the day the products were delivered to you, without explanation and without any penalties. In the case of a Cool-off, Fairphone may reduce the refund of the purchase price (including delivery costs) to reflect any reduction in the value of the Products, if this has been caused by your handling them in a way which would not normally be permitted in a shop. This means You are entitled to turn on and inspect Your purchased device to familiarise yourself with its properties and ensure that it is working correctly – comparable to the conditions that are permitted within a shop. I followed their instructions and filed a support ticket on 2025-02-16. On 2025-02-25, I had not yet received any contact from Fairphone and I asked them again under the same ticket. On 2025-03-07, I received an automated message that apologized for the delay and asked me to not make any additional tickets on the matter. I’m still waiting for an update for the support ticket over a month later, while the headphones sit in the original packaging. Based on the experiences by others in the Fairphone community forum, it seems that unacceptably large delays in customer service are the norm for Fairphone. Fairphone, if you want to succeed as a company, you need to make sure that the one part of your company that’s directly interfacing with your actual paying customers needs to be appropriately staffed and resourced. A bad customer support experience can turn off a brand evangelist overnight. Closing thoughts I want Fairphone to succeed in their mission, but products like these do not further the cause. The feature set of the Fairbuds XL seems competent, and I’m willing to give a pass on a few minor issues if the overall experience is good, but the unimpressive sound profile, broken active noise-cancelling mode, multiple quality issues and poor customer service mean that I can’t in good conscience recommend the Fairphone Fairbuds XL, not even on sale. Perhaps less resources should be spent on rebranding and more on engineering good products. Remember dubstep being a thing? Yeah, so do I. That, plus a little bit of mandatory military service can do a lot of damage to hearing. ↩︎

a month ago 24 votes
I yearn for the perfect home server

I’ve changed my home server setup a lot over the past decade, mainly because I keep changing the goals all the time. I’ve now realized why that keeps happening. I want the perfect home server. What is the perfect home server? I’d phrase it like this: The perfect home server uses very little power, offers plenty of affordable storage and provides a lot of performance when it’s actually being relied upon. In my case, low power means less than 5 W while idling, 10+ TB of redundant storage for data resilience and integrity concerns, and performance means about 4 modern CPU cores’ worth (low-to-midrange desktop CPU performance). I seem to only ever get one or two at most. Low power usage? Your performance will likely suffer, and you can’t run too many storage drives. You can run SSD-s, but they are not affordable if you need higher capacities. Lots of storage? Well, there goes the low power consumption goal, especially if you run 3.5" hard drives. Lots of performance? Lots of power consumed! There’s just something that annoys me whenever I do things on my home server and I have to wait longer than I should, and yet I’m bothered when my monitoring tells me that my home server is using 50+ watts.1 I keep an eye out for developments in the self-hosting and home server spaces with the hopes that I’ll one day stumble upon the holy grail, that one server that fits all my needs. I’ve gotten close, but no matter what setup I have, there’s always something that keeps bothering me. I’ve seen a few attempts at the perfect home server, covered by various tech reviewers, but they always have at least one critical flaw. Sometimes the whole package is actually great, the functionality rocks, and then you find that the hardware contains prototype-level solutions that result in the power consumption ballooning to over 30 W. Or the price is over 1000 USD/EUR, not including the drives. Or it’s only available in certain markets and the shipping and import duties destroy its value proposition. There is no affordable platform out there that provides great performance, flexibility and storage space, all while being quiet and using very little power.2 Desktop PC-s repurposed as home servers can provide room for a lot of storage, and they are by design very flexible, but the trade-off is the higher power consumption of the setup. Single board computers use very little power, but they can’t provide a lot of performance and connecting storage to them gets tricky and is overall limited. They can also get surprisingly expensive. NAS boxes provide a lot of storage space and are generally low power if you exclude the power consumption of hard drives, but the cheaper ones are not that performant, and the performant ones cost almost as much as a high-end PC. Laptops can be used as home servers, they are quite efficient and performant, but they lack the flexibility and storage options of desktop PC-s and NAS boxes. You can slap a USB-based DAS to it to add storage, but I’ve had poor experiences with these under high load, meaning that these approaches can’t be relied on if you care about your data and server stability. Then there’s the option of buying used versions of all of the above. Great bang for buck, but you’re likely taking a hit on the power efficiency part due to the simple fact that technology keeps evolving and getting more efficient. I’m still hopeful that one day a device exists that ticks all the boxes while also being priced affordably, but I’m afraid that it’s just a pipe dream. There are builds out there that fill in almost every need, but the parts list is very specific and the bulk of the power consumption wins come from using SSD-s instead of hard drives, which makes it less affordable. In the meantime I guess I’ll keep rocking my ThinkPad-as-a-server approach and praying that the USB-attached storage does not cause major issues. perhaps it’s an undiagnosed medical condition. Homeserveritis? ↩︎ if there is one, then let me know, you can find the contact details below! ↩︎

2 months ago 30 votes
Turns out that I'm a 'prolific open-source influencer' now

Yes, you read that right. I’m a prolific open-source influencer now. Some years ago I set up a Google Alert with my name, for fun. Who knows what it might show one day? On 7th of February, it fired an alert. Turns out that my thoughts on Ubuntu were somewhat popular, and it ended up being ingested by an AI slop generator over at Fudzilla, with no links back to the source or anything.1 Not only that, but their choice of spicy autocomplete confabulation bot a large language model completely butchered the article, leaving out critical information, which lead to one reader gloating about Windows. Not linking back to the original source? Not a good start. Misrepresenting my work? Insulting. Giving a Windows user the opportunity to boast about how happy they are with using it? Absolutely unacceptable. Here’s the full article in case they ever delete their poor excuse of a “news” “article”. two can play at that game. ↩︎

2 months ago 29 votes

More in technology

2025-05-11 air traffic control

Air traffic control has been in the news lately, on account of my country's declining ability to do it. Well, that's a long-term trend, resulting from decades of under-investment, severe capture by our increasingly incompetent defense-industrial complex, no small degree of management incompetence in the FAA, and long-lasting effects of Reagan crushing the PATCO strike. But that's just my opinion, you know, maybe airplanes got too woke. In any case, it's an interesting time to consider how weird parts of air traffic control are. The technical, administrative, and social aspects of ATC all seem two notches more complicated than you would expect. ATC is heavily influenced by its peculiar and often accidental development, a product of necessity that perpetually trails behind the need, and a beneficiary of hand-me-down military practices and technology. Aviation Radio In the early days of aviation, there was little need for ATC---there just weren't many planes, and technology didn't allow ground-based controllers to do much of value. There was some use of flags and signal lights to clear aircraft to land, but for the most part ATC had to wait for the development of aviation radio. The impetus for that work came mostly from the First World War. Here we have to note that the history of aviation is very closely intertwined with the history of warfare. Aviation technology has always rapidly advanced during major conflicts, and as we will see, ATC is no exception. By 1913, the US Army Signal Corps was experimenting with the use of radio to communicate with aircraft. This was pretty early in radio technology, and the aircraft radios were huge and awkward to operate, but it was also early in aviation and "huge and awkward to operate" could be similarly applied to the aircraft of the day. Even so, radio had obvious potential in aviation. The first military application for aircraft was reconnaissance. Pilots could fly past the front to find artillery positions and otherwise provide useful information, and then return with maps. Well, even better than returning with a map was providing the information in real-time, and by the end of the war medium-frequency AM radios were well developed for aircraft. Radios in aircraft lead naturally to another wartime innovation: ground control. Military personnel on the ground used radio to coordinate the schedules and routes of reconnaissance planes, and later to inform on the positions of fighters and other enemy assets. Without any real way to know where the planes were, this was all pretty primitive, but it set the basic pattern that people on the ground could keep track of aircraft and provide useful information. Post-war, civil aviation rapidly advanced. The early 1920s saw numerous commercial airlines adopting radio, mostly for business purposes like schedule coordination. Once you were in contact with someone on the ground, though, it was only logical to ask about weather and conditions. Many of our modern practices like weather briefings, flight plans, and route clearances originated as more or less formal practices within individual airlines. Air Mail The government was not left out of the action. The Post Office operated what may have been the largest commercial aviation operation in the world during the early 1920s, in the form of Air Mail. The Post Office itself did not have any aircraft; all of the flying was contracted out---initially to the Army Air Service, and later to a long list of regional airlines. Air Mail was considered a high priority by the Post Office and proved very popular with the public. When the transcontinental route began proper operation in 1920, it became possible to get a letter from New York City to San Francisco in just 33 hours by transferring it between airplanes in a nearly non-stop relay race. The Post Office's largesse in contracting the service to private operators provided not only the funding but the very motivation for much of our modern aviation industry. Air travel was not very popular at the time, being loud and uncomfortable, but the mail didn't complain. The many contract mail carriers of the 1920s grew and consolidated into what are now some of the United States' largest companies. For around a decade, the Post Office almost singlehandedly bankrolled civil aviation, and passengers were a side hustle [1]. Air mail ambition was not only of economic benefit. Air mail routes were often longer and more challenging than commercial passenger routes. Transcontinental service required regular flights through sparsely populated parts of the interior, challenging the navigation technology of the time and making rescue of downed pilots a major concern. Notably, air mail operators did far more nighttime flying than any other commercial aviation in the 1920s. The post office became the government's de facto technical leader in civil aviation. Besides the network of beacons and markers built to guide air mail between cities, the post office built 17 Air Mail Radio Stations along the transcontinental route. The Air Mail Radio Stations were the company radio system for the entire air mail enterprise, and the closest thing to a nationwide, public air traffic control service to then exist. They did not, however, provide what we would now call control. Their role was mainly to provide pilots with information (including, critically, weather reports) and to keep loose tabs on air mail flights so that a disappearance would be noticed in time to send search and rescue. In 1926, the Watres Act created the Aeronautic Branch of the Department of Commerce. The Aeronautic Branch assumed a number of responsibilities, but one of them was the maintenance of the Air Mail routes. Similarly, the Air Mail Radio Stations became Aeronautics Branch facilities, and took on the new name of Flight Service Stations. No longer just for the contract mail carriers, the Flight Service Stations made up a nationwide network of government-provided services to aviators. They were the first edifices in what we now call the National Airspace System (NAS): a complex combination of physical facilities, technologies, and operating practices that enable safe aviation. In 1935, the first en-route air traffic control center opened, a facility in Newark owned by a group of airlines. The Aeronautic Branch, since renamed the Bureau of Air Commerce, supported the airlines in developing this new concept of en-route control that used radio communications and paperwork to track which aircraft were in which airways. The rising number of commercial aircraft made in-air collisions a bigger problem, so the Newark control center was quickly followed by more facilities built on the same pattern. In 1936, the Bureau of Air Commerce took ownership of these centers, and ATC became a government function alongside the advisory and safety services provided by the flight service stations. En route center controllers worked off of position reports from pilots via radio, but needed a way to visualize and track aircraft's positions and their intended flight paths. Several techniques helped: first, airlines shared their flight planning paperwork with the control centers, establishing "flight plans" that corresponded to each aircraft in the sky. Controllers adopted a work aid called a "flight strip," a small piece of paper with the key information about an aircraft's identity and flight plan that could easily be handed between stations. By arranging the flight strips on display boards full of slots, controllers could visualize the ordering of aircraft in terms of altitude and airway. Second, each center was equipped with a large plotting table map where controllers pushed markers around to correspond to the position reports from aircraft. A small flag on each marker gave the flight number, so it could easily be correlated to a flight strip on one of the boards mounted around the plotting table. This basic concept of air traffic control, of a flight strip and a position marker, is still in use today. Radar The Second World War changed aviation more than any other event of history. Among the many advancements were two British inventions of particular significance: first, the jet engine, which would make modern passenger airliners practical. Second, the radar, and more specifically the magnetron. This was a development of such significance that the British government treated it as a secret akin to nuclear weapons; indeed, the UK effectively traded radar technology to the US in exchange for participation in US nuclear weapons research. Radar created radical new possibilities for air defense, and complimented previous air defense development in Britain. During WWI, the organization tasked with defending London from aerial attack had developed a method called "ground-controlled interception" or GCI. Under GCI, ground-based observers identify possible targets and then direct attack aircraft towards them via radio. The advent of radar made GCI tremendously more powerful, allowing a relatively small number of radar-assisted air defense centers to monitor for inbound attack and then direct defenders with real-time vectors. In the first implementation, radar stations reported contacts via telephone to "filter centers" that correlated tracks from separate radars to create a unified view of the airspace---drawn in grease pencil on a preprinted map. Filter center staff took radar and visual reports and updated the map by moving the marks. This consolidated information was then provided to air defense bases, once again by telephone. Later technical developments in the UK made the process more automated. The invention of the "plan position indicator" or PPI, the type of radar scope we are all familiar with today, made the radar far easier to operate and interpret. Radar sets that automatically swept over 360 degrees allowed each radar station to see all activity in its area, rather than just aircraft passing through a defensive line. These new capabilities eliminated the need for much of the manual work: radar stations could see attacking aircraft and defending aircraft on one PPI, and communicated directly with defenders by radio. It became routine for a radar operator to give a pilot navigation vectors by radio, based on real-time observation of the pilot's position and heading. A controller took strategic command of the airspace, effectively steering the aircraft from a top-down view. The ease and efficiency of this workflow was a significant factor in the end of the Battle of Britain, and its remarkable efficacy was noticed in the US as well. At the same time, changes were afoot in the US. WWII was tremendously disruptive to civil aviation; while aviation technology rapidly advanced due to wartime needs those same pressing demands lead to a slowdown in nonmilitary activity. A heavy volume of military logistics flights and flight training, as well as growing concerns about defending the US from an invasion, meant that ATC was still a priority. A reorganization of the Bureau of Air Commerce replaced it with the Civil Aeronautics Authority, or CAA. The CAA's role greatly expanded as it assumed responsibility for airport control towers and commissioned new en route centers. As WWII came to a close, CAA en route control centers began to adopt GCI techniques. By 1955, the name Air Route Traffic Control Center (ARTCC) had been adopted for en route centers and the first air surveillance radars were installed. In a radar-equipped ARTCC, the map where controllers pushed markers around was replaced with a large tabletop PPI built to a Navy design. The controllers still pushed markers around to track the identities of aircraft, but they moved them based on their corresponding radar "blips" instead of radio position reports. Air Defense After WWII, post-war prosperity and wartime technology like the jet engine lead to huge growth in commercial aviation. During the 1950s, radar was adopted by more and more ATC facilities (both "terminal" at airports and "en route" at ARTCCs), but there were few major changes in ATC procedure. With more and more planes in the air, tracking flight plans and their corresponding positions became labor intensive and error-prone. A particular problem was the increasing range and speed of aircraft, and corresponding longer passenger flights, that meant that many aircraft passed from the territory of one ARTCC into another. This required that controllers "hand off" the aircraft, informing the "next" ARTCC of the flight plan and position at which the aircraft would enter their airspace. In 1956, 128 people died in a mid-air collision of two commercial airliners over the Grand Canyon. In 1958, 49 people died when a military fighter struck a commercial airliner over Nevada. These were not the only such incidents in the mid-1950s, and public trust in aviation started to decline. Something had to be done. First, in 1958 the CAA gave way to the Federal Aviation Administration. This was more than just a name change: the FAA's authority was greatly increased compared tot he CAA, most notably by granting it authority over military aviation. This is a difficult topic to explain succinctly, so I will only give broad strokes. Prior to 1958, military aviation was completely distinct from civil aviation, with no coordination and often no communication at all between the two. This was, of course, a factor in the 1958 collision. Further, the 1956 collision, while it did not involve the military, did result in part from communications issues between separate distinct CAA facilities and the airline's own control facilities. After 1958, ATC was completely unified into one organization, the FAA, which assumed the work of the military controllers of the time and some of the role of the airlines. The military continues to have its own air controllers to this day, and military aircraft continue to include privileges such as (practical but not legal) exemption from transponder requirements, but military flights over the US are still beholden to the same ATC as civil flights. Some exceptions apply, void where prohibited, etc. The FAA's suddenly increased scope only made the practical challenges of ATC more difficult, and commercial aviation numbers continued to rise. As soon as the FAA was formed, it was understood that there needed to be major investments in improving the National Airspace System. While the first couple of years were dominated by the transition, the FAA's second director (Najeeb Halaby) prepared two lengthy reports examining the situation and recommending improvements. One of these, the Beacon report (also called Project Beacon), specifically addressed ATC. The Beacon report's recommendations included massive expansion of radar-based control (called "positive control" because of the controller's access to real-time feedback on aircraft movements) and new control procedures for airways and airports. Even better, for our purposes, it recommended the adoption of general-purpose computers and software to automate ATC functions. Meanwhile, the Cold War was heating up. US air defense, a minor concern in the few short years after WWII, became a higher priority than ever before. The Soviet Union had long-range aircraft capable of reaching the United States, and nuclear weapons meant that only a few such aircraft had to make it to cause massive destruction. Considering the vast size of the United States (and, considering the new unified air defense command between the United States and Canada, all of North America) made this a formidable challenge. During the 1950s, the newly minted Air Force worked closely with MIT's Lincoln Laboratory (an important center of radar research) and IBM to design a computerized, integrated, networked system for GCI. When the Air Force committed to purchasing the system, it was christened the Semi-Automated Ground Environment, or SAGE. SAGE is a critical juncture in the history of the computer and computer communications, the first system to demonstrate many parts of modern computer technology and, moreover, perhaps the first large-scale computer system of any kind. SAGE is an expansive topic that I will not take on here; I'm sure it will be the focus of a future article but it's a pretty well-known and well-covered topic. I have not so far felt like I had much new to contribute, despite it being the first item on my "list of topics" for the last five years. But one of the things I want to tell you about SAGE, that is perhaps not so well known, is that SAGE was not used for ATC. SAGE was a purely military system. It was commissioned by the Air Force, and its numerous operating facilities (called "direction centers") were located on Air Force bases along with the interceptor forces they would direct. However, there was obvious overlap between the functionality of SAGE and the needs of ATC. SAGE direction centers continuously received tracks from remote data sites using modems over leased telephone lines, and automatically correlated multiple radar tracks to a single aircraft. Once an operator entered information about an aircraft, SAGE stored that information for retrieval by other radar operators. When an aircraft with associated data passed from the territory of one direction center to another, the aircraft's position and related information were automatically transmitted to the next direction center by modem. One of the key demands of air defense is the identification of aircraft---any unknown track might be routine commercial activity, or it could be an inbound attack. The air defense command received flight plan data on commercial flights (and more broadly all flights entering North America) from the FAA and entered them into SAGE, allowing radar operators to retrieve "flight strip" data on any aircraft on their scope. Recognizing this interconnection with ATC, as soon as SAGE direction centers were being installed the Air Force started work on an upgrade called SAGE Air Traffic Integration, or SATIN. SATIN would extend SAGE to serve the ATC use-case as well, providing SAGE consoles directly in ARTCCs and enhancing SAGE to perform non-military safety functions like conflict warning and forward projection of flight plans for scheduling. Flight strips would be replaced by teletype output, and in general made less necessary by the computer's ability to filter the radar scope. Experimental trial installations were made, and the FAA participated readily in the research efforts. Enhancement of SAGE to meet ATC requirements seemed likely to meet the Beacon report's recommendations and radically improve ARTCC operations, sooner and cheaper than development of an FAA-specific system. As it happened, well, it didn't happen. SATIN became interconnected with another planned SAGE upgrade to the Super Combat Centers (SCC), deep underground combat command centers with greatly enhanced SAGE computer equipment. SATIN and SCC planners were so confident that the last three Air Defense Sectors scheduled for SAGE installation, including my own Albuquerque, were delayed under the assumption that the improved SATIN/SCC equipment should be installed instead of the soon-obsolete original system. SCC cost estimates ballooned, and the program's ambitions were reduced month by month until it was canceled entirely in 1960. Albuquerque never got a SAGE installation, and the Albuquerque air defense sector was eliminated by reorganization later in 1960 anyway. Flight Service Stations Remember those Flight Service Stations, the ones that were originally built by the Post Office? One of the oddities of ATC is that they never went away. FSS were transferred to the CAB, to the CAA, and then to the FAA. During the 1930s and 1940s many more were built, expanding coverage across much of the country. Throughout the development of ATC, the FSS remained responsible for non-control functions like weather briefing and flight plan management. Because aircraft operating under instrument flight rules must closely comply with ATC, the involvement of FSS in IFR flights is very limited, and FSS mostly serve VFR traffic. As ATC became common, the FSS gained a new and somewhat odd role: playing go-between for ATC. FSS were more numerous and often located in sparser areas between cities (while ATC facilities tended to be in cities), so especially in the mid-century, pilots were more likely to be able to reach an FSS than ATC. It was, for a time, routine for FSS to relay instructions between pilots and controllers. This is still done today, although improved communications have made the need much less common. As weather dissemination improved (another topic for a future post), FSS gained access to extensive weather conditions and forecasting information from the Weather Service. This connectivity is bidirectional; during the midcentury FSS not only received weather forecasts by teletype but transmitted pilot reports of weather conditions back to the Weather Service. Today these communications have, of course, been computerized, although the legacy teletype format doggedly persists. There has always been an odd schism between the FSS and ATC: they are operated by different departments, out of different facilities, with different functions and operating practices. In 2005, the FAA cut costs by privatizing the FSS function entirely. Flight service is now operated by Leidos, one of the largest government contractors. All FSS operations have been centralized to one facility that communicates via remote radio sites. While flight service is still available, increasing automation has made the stations far less important, and the general perception is that flight service is in its last years. Last I looked, Leidos was not hiring for flight service and the expectation was that they would never hire again, retiring the service along with its staff. Flight service does maintain one of my favorite internet phenomenon, the phone number domain name: 1800wxbrief.com. One of the odd manifestations of the FSS/ATC schism and the FAA's very partial privatization is that Leidos maintains an online aviation weather portal that is separate from, and competes with, the Weather Service's aviationweather.gov. Since Flight Service traditionally has the responsibility for weather briefings, it is honestly unclear to what extend Leidos vs. the National Weather Service should be investing in aviation weather information services. For its part, the FAA seems to consider aviationweather.gov the official source, while it pays for 1800wxbrief.com. There's also weathercams.faa.gov, which duplicates a very large portion (maybe all?) of the weather information on Leidos's portal and some of the NWS's. It's just one of those things. Or three of those things, rather. Speaking of duplication due to poor planning... The National Airspace System Left in the lurch by the Air Force, the FAA launched its own program for ATC automation. While the Air Force was deploying SAGE, the FAA had mostly been waiting, and various ARTCCs had adopted a hodgepodge of methods ranging from one-off computer systems to completely paper-based tracking. By 1960 radar was ubiquitous, but different radar systems were used at different facilities, and correlation between radar contacts and flight plans was completely manual. The FAA needed something better, and with growing congressional support for ATC modernization, they had the money to fund what they called National Airspace System En Route Stage A. Further bolstering historical confusion between SAGE and ATC, the FAA decided on a practical, if ironic, solution: buy their own SAGE. In an upcoming article, we'll learn about the FAA's first fully integrated computerized air traffic control system. While the failed detour through SATIN delayed the development of this system, the nearly decade-long delay between the design of SAGE and the FAA's contract allowed significant technical improvements. This "New SAGE," while directly based on SAGE at a functional level, used later off-the-shelf computer equipment including the IBM System/360, giving it far more resemblance to our modern world of computing than SAGE with its enormous, bespoke AN/FSQ-7. And we're still dealing with the consequences today! [1] It also laid the groundwork for the consolidation of the industry, with a 1930 decision that took air mail contracts away from most of the smaller companies and awarded them instead to the precursors of United, TWA, and American Airlines.

23 hours ago 1 votes
Sierpiński triangle? In my bitwise AND?

Exploring a peculiar bit-twiddling hack at the intersection of 1980s geek sensibilities.

yesterday 4 votes
Reverse engineering the 386 processor's prefetch queue circuitry

In 1985, Intel introduced the groundbreaking 386 processor, the first 32-bit processor in the x86 architecture. To improve performance, the 386 has a 16-byte instruction prefetch queue. The purpose of the prefetch queue is to fetch instructions from memory before they are needed, so the processor usually doesn't need to wait on memory while executing instructions. Instruction prefetching takes advantage of times when the processor is "thinking" and the memory bus would otherwise be unused. In this article, I look at the 386's prefetch queue circuitry in detail. One interesting circuit is the incrementer, which adds 1 to a pointer to step through memory. This sounds easy enough, but the incrementer uses complicated circuitry for high performance. The prefetch queue uses a large network to shift bytes around so they are properly aligned. It also has a compact circuit to extend signed 8-bit and 16-bit numbers to 32 bits. There aren't any major discoveries in this post, but if you're interested in low-level circuits and dynamic logic, keep reading. The photo below shows the 386's shiny fingernail-sized silicon die under a microscope. Although it may look like an aerial view of a strangely-zoned city, the die photo reveals the functional blocks of the chip. The Prefetch Unit in the upper left is the relevant block. In this post, I'll discuss the prefetch queue circuitry (highlighted in red), skipping over the prefetch control circuitry to the right. The Prefetch Unit receives data from the Bus Interface Unit (upper right) that communicates with memory. The Instruction Decode Unit receives prefetched instructions from the Prefetch Unit, byte by byte, and decodes the opcodes for execution. This die photo of the 386 shows the location of the registers. Click this image (or any other) for a larger version. The left quarter of the chip consists of stripes of circuitry that appears much more orderly than the rest of the chip. This grid-like appearance arises because each functional block is constructed (for the most part) by repeating the same circuit 32 times, once for each bit, side by side. Vertical data lines run up and down, in groups of 32 bits, connecting the functional blocks. To make this work, each circuit must fit into the same width on the die; this layout constraint forces the circuit designers to develop a circuit that uses this width efficiently without exceeding the allowed width. The circuitry for the prefetch queue uses the same approach: each circuit is 66 µm wide1 and repeated 32 times. As will be seen, fitting the prefetch circuitry into this fixed width requires some layout tricks. What the prefetcher does The purpose of the prefetch unit is to speed up performance by reading instructions from memory before they are needed, so the processor won't need to wait to get instructions from memory. Prefetching takes advantage of times when the memory bus is otherwise idle, minimizing conflict with other instructions that are reading or writing data. In the 386, prefetched instructions are stored in a 16-byte queue, consisting of four 32-bit blocks.2 The diagram below zooms in on the prefetcher and shows its main components. You can see how the same circuit (in most cases) is repeated 32 times, forming vertical bands. At the top are 32 bus lines from the Bus Interface Unit. These lines provide the connection between the datapath and external memory, via the Bus Interface Unit. These lines form a triangular pattern as the 32 horizontal lines on the right branch off and form 32 vertical lines, one for each bit. Next are the fetch pointer and the limit register, with a circuit to check if the fetch pointer has reached the limit. Note that the two low-order bits (on the right) of the incrementer and limit check circuit are missing. At the bottom of the incrementer, you can see that some bit positions have a blob of circuitry missing from others, breaking the pattern of repeated blocks. The 16-byte prefetch queue is below the incrementer. Although this memory is the heart of the prefetcher, its circuitry takes up a relatively small area. A close-up of the prefetcher with the main blocks labeled. At the right, the prefetcher receives control signals. The bottom part of the prefetcher shifts data to align it as needed. A 32-bit value can be split across two 32-bit rows of the prefetch buffer. To handle this, the prefetcher includes a data shift network to shift and align its data. This network occupies a lot of space, but there is no active circuitry here: just a grid of horizontal and vertical wires. Finally, the sign extend circuitry converts a signed 8-bit or 16-bit value into a signed 16-bit or 32-bit value as needed. You can see that the sign extend circuitry is highly irregular, especially in the middle. A latch stores the output of the prefetch queue for use by the rest of the datapath. Limit check If you've written x86 programs, you probably know about the processor's Instruction Pointer (EIP) that holds the address of the next instruction to execute. As a program executes, the Instruction Pointer moves from instruction to instruction. However, it turns out that the Instruction Pointer doesn't actually exist! Instead, the 386 has an "Advance Instruction Fetch Pointer", which holds the address of the next instruction to fetch into the prefetch queue. But sometimes the processor needs to know the Instruction Pointer value, for instance, to determine the return address when calling a subroutine or to compute the destination address of a relative jump. So what happens? The processor gets the Advance Instruction Fetch Pointer address from the prefetch queue circuitry and subtracts the current length of the prefetch queue. The result is the address of the next instruction to execute, the desired Instruction Pointer value. The Advance Instruction Fetch Pointer—the address of the next instruction to prefetch—is stored in a register at the top of the prefetch queue circuitry. As instructions are prefetched, this pointer is incremented by the prefetch circuitry. (Since instructions are fetched 32 bits at a time, this pointer is incremented in steps of four and the bottom two bits are always 0.) But what keeps the prefetcher from prefetching too far and going outside the valid memory range? The x86 architecture infamously uses segments to define valid regions of memory. A segment has a start and end address (known as the base and limit) and memory is protected by blocking accesses outside the segment. The 386 has six active segments; the relevant one is the Code Segment that holds program instructions. Thus, the limit address of the Code Segment controls when the prefetcher must stop prefetching.3 The prefetch queue contains a circuit to stop prefetching when the fetch pointer reaches the limit of the Code Segment. In this section, I'll describe that circuit. Comparing two values may seem trivial, but the 386 uses a few tricks to make this fast. The basic idea is to use 30 XOR gates to compare the bits of the two registers. (Why 30 bits and not 32? Since 32 bits are fetched at a time, the bottom bits of the address are 00 and can be ignored.) If the two registers match, all the XOR values will be 0, but if they don't match, an XOR value will be 1. Conceptually, connecting the XORs to a 32-input OR gate will yield the desired result: 0 if all bits match and 1 if there is a mismatch. Unfortunately, building a 32-input OR gate using standard CMOS logic is impractical for electrical reasons, as well as inconveniently large to fit into the circuit. Instead, the 386 uses dynamic logic to implement a spread-out NOR gate with one transistor in each column of the prefetcher. The schematic below shows the implementation of one bit of the equality comparison. The mechanism is that if the two registers differ, the transistor on the right is turned on, pulling the equality bus low. This circuit is replicated 30 times, comparing all the bits: if there is any mismatch, the equality bus will be pulled low, but if all bits match, the bus remains high. The three gates on the left implement XNOR; this circuit may seem overly complicated, but it is a standard way of implementing XNOR. The NOR gate at the right blocks the comparison except during clock phase 2. (The importance of this will be explained below.) This circuit is repeated 30 times to compare the registers. The equality bus travels horizontally through the prefetcher, pulled low if any bits don't match. But what pulls the bus high? That's the job of the dynamic circuit below. Unlike regular static gates, dynamic logic is controlled by the processor's clock signals and depends on capacitance in the circuit to hold data. The 386 is controlled by a two-phase clock signal.4 In the first clock phase, the precharge transistor below turns on, pulling the equality bus high. In the second clock phase, the XOR circuits above are enabled, pulling the equality bus low if the two registers don't match. Meanwhile, the CMOS switch turns on in clock phase 2, passing the equality bus's value to the latch. The "keeper" circuit keeps the equality bus held high unless it is explicitly pulled low, to avoid the risk of the voltage on the equality bus slowly dissipating. The keeper uses a weak transistor to keep the bus high while inactive. But if the bus is pulled low, the keeper transistor is overpowered and turns off. This is the output circuit for the equality comparison. This circuit is located to the right of the prefetcher. This dynamic logic reduces power consumption and circuit size. Since the bus is charged and discharged during opposite clock phases, you avoid steady current through the transistors. (In contrast, an NMOS processor like the 8086 might use a pull-up on the bus. When the bus is pulled low, would you end up with current flowing through the pull-up and the pull-down transistors. This would increase power consumption, make the chip run hotter, and limit your clock speed.) The incrementer After each prefetch, the Advance Instruction Fetch Pointer must be incremented to hold the address of the next instruction to prefetch. Incrementing this pointer is the job of the incrementer. (Because each fetch is 32 bits, the pointer is incremented by 4 each time. But in the die photo, you can see a notch in the incrementer and limit check circuit where the circuitry for the bottom two bits has been omitted. Thus, the incrementer's circuitry increments its value by 1, so the pointer (with two zero bits appended) increases in steps of 4.) Building an incrementer circuit is straightforward, for example, you can use a chain of 30 half-adders. The problem is that incrementing a 30-bit value at high speed is difficult because of the carries from one position to the next. It's similar to calculating 99999999 + 1 in decimal; you need to tediously carry the 1, carry the 1, carry the 1, and so forth, through all the digits, resulting in a slow, sequential process. The incrementer uses a faster approach. First, it computes all the carries at high speed, almost in parallel. Then it computes each output bit in parallel from the carries—if there is a carry into a position, it toggles that bit. Computing the carries is straightforward in concept: if there is a block of 1 bits at the end of the value, all those bits will produce carries, but carrying is stopped by the rightmost 0 bit. For instance, incrementing binary 11011 results in 11100; there are carries from the last two bits, but the zero stops the carries. A circuit to implement this was developed at the University of Manchester in England way back in 1959, and is known as the Manchester carry chain. In the Manchester carry chain, you build a chain of switches, one for each data bit, as shown below. For a 1 bit, you close the switch, but for a 0 bit you open the switch. (The switches are implemented by transistors.) To compute the carries, you start by feeding in a carry signal at the right The signal will go through the closed switches until it hits an open switch, and then it will be blocked.5 The outputs along the chain give us the desired carry value at each position. Concept of the Manchester carry chain, 4 bits. Since the switches in the Manchester carry chain can all be set in parallel and the carry signal blasts through the switches at high speed, this circuit rapidly computes the carries we need. The carries then flip the associated bits (in parallel), giving us the result much faster than a straightforward adder. There are complications, of course, in the actual implementation. The carry signal in the carry chain is inverted, so a low signal propagates through the carry chain to indicate a carry. (It is faster to pull a signal low than high.) But something needs to make the line go high when necessary. As with the equality circuitry, the solution is dynamic logic. That is, the carry line is precharged high during one clock phase and then processing happens in the second clock phase, potentially pulling the line low. The next problem is that the carry signal weakens as it passes through multiple transistors and long lengths of wire. The solution is that each segment has a circuit to amplify the signal, using a clocked inverter and an asymmetrical inverter. Importantly, this amplifier is not in the carry chain path, so it doesn't slow down the signal through the chain. The Manchester carry chain circuit for a typical bit in the incrementer. The schematic above shows the implementation of the Manchester carry chain for a typical bit. The chain itself is at the bottom, with the transistor switch as before. During clock phase 1, the precharge transistor pulls this segment of the carry chain high. During clock phase 2, the signal on the chain goes through the "clocked inverter" at the right to produce the local carry signal. If there is a carry, the next bit is flipped by the XOR gate, producing the incremented output.6 The "keeper/amplifier" is an asymmetrical inverter that produces a strong low output but a weak high output. When there is no carry, its weak output keeps the carry chain pulled high. But as soon as a carry is detected, it strongly pulls the carry chain low to boost the carry signal. But this circuit still isn't enough for the desired performance. The incrementer uses a second carry technique in parallel: carry skip. The concept is to look at blocks of bits and allow the carry to jump over the entire block. The diagram below shows a simplified implementation of the carry skip circuit. Each block consists of 3 to 6 bits. If all the bits in a block are 1's, then the AND gate turns on the associated transistor in the carry skip line. This allows the carry skip signal to propagate (from left to right), a block at a time. When it reaches a block with a 0 bit, the corresponding transistor will be off, stopping the carry as in the Manchester carry chain. The AND gates all operate in parallel, so the transistors are rapidly turned on or off in parallel. Then, the carry skip signal passes through a small number of transistors, without going through any logic. (The carry skip signal is like an express train that skips most stations, while the Manchester carry chain is the local train to all the stations.) Like the Manchester carry chain, the implementation of carry skip needs precharge circuits on the lines, a keeper/amplifier, and clocked logic, but I'll skip the details. An abstracted and simplified carry-skip circuit. The block sizes don't match the 386's circuit. One interesting feature is the layout of the large AND gates. A 6-input AND gate is a large device, difficult to fit into one cell of the incrementer. The solution is that the gate is spread out across multiple cells. Specifically, the gate uses a standard CMOS NAND gate circuit with NMOS transistors in series and PMOS transistors in parallel. Each cell has an NMOS transistor and a PMOS transistor, and the chains are connected at the end to form the desired NAND gate. (Inverting the output produces the desired AND function.) This spread-out layout technique is unusual, but keeps each bit's circuitry approximately the same size. The incrementer circuitry was tricky to reverse engineer because of these techniques. In particular, most of the prefetcher consists of a single block of circuitry repeated 32 times, once for each bit. The incrementer, on the other hand, consists of four different blocks of circuitry, repeating in an irregular pattern. Specifically, one block starts a carry chain, a second block continues the carry chain, and a third block ends a carry chain. The block before the ending block is different (one large transistor to drive the last block), making four variants in total. This irregular pattern is visible in the earlier photo of the prefetcher. The alignment network The bottom part of the prefetcher rotates data to align it as needed. Unlike some processors, the x86 does not enforce aligned memory accesses. That is, a 32-bit value does not need to start on a 4-byte boundary in memory. As a result, a 32-bit value may be split across two 32-bit rows of the prefetch queue. Moreover, when the instruction decoder fetches one byte of an instruction, that byte may be at any position in the prefetch queue. To deal with these problems, the prefetcher includes an alignment network that can rotate bytes to output a byte, word, or four bytes with the alignment required by the rest of the processor. The diagram below shows part of this alignment network. Each bit exiting the prefetch queue (top) has four wires, for rotates of 24, 16, 8, or 0 bits. Each rotate wire is connected to one of the 32 horizontal bit lines. Finally, each horizontal bit line has an output tap, going to the datapath below. (The vertical lines are in the chip's lower M1 metal layer, while the horizontal lines are in the upper M2 metal layer. For this photo, I removed the M2 layer to show the underlying layer. Shadows of the original horizontal lines are still visible.) Part of the alignment network. The idea is that by selecting one set of vertical rotate lines, the 32-bit output from the prefetch queue will be rotated left by that amount. For instance, to rotate by 8, bits are sent down the "rotate 8" lines. Bit 0 from the prefetch queue will energize horizontal line 8, bit 1 will energize horizontal line 9, and so forth, with bit 31 wrapping around to horizontal line 7. Since horizontal bit line 8 is connected to output 8, the result is that bit 0 is output as bit 8, bit 1 is output as bit 9, and so forth. The four possibilities for aligning a 32-bit value. The four bytes above are shifted as specified to produce the desired output below. For the alignment process, one 32-bit output may be split across two 32-bit entries in the prefetch queue in four different ways, as shown above. These combinations are implemented by multiplexers and drivers. Two 32-bit multiplexers select the two relevant rows in the prefetch queue (blue and green above). Four 32-bit drivers are connected to the four sets of vertical lines, with one set of drivers activated to produce the desired shift. Each byte of each driver is wired to achieve the alignment shown above. For instance, the rotate-8 driver gets its top byte from the "green" multiplexer and the other three bytes from the "blue" multiplexer. The result is that the four bytes, split across two queue rows, are rotated to form an aligned 32-bit value. Sign extension The final circuit is sign extension. Suppose you want to add an 8-bit value to a 32-bit value. An unsigned 8-bit value can be extended to 32 bits by simply filling the upper bits with zeroes. But for a signed value, it's trickier. For instance, -1 is the eight-bit value 0xFF, but the 32-bit value is 0xFFFFFFFF. To convert an 8-bit signed value to 32 bits, the top 24 bits must be filled in with the top bit of the original value (which indicates the sign). In other words, for a positive value, the extra bits are filled with 0, but for a negative value, the extra bits are filled with 1. This process is called sign extension.9 In the 386, a circuit at the bottom of the prefetcher performs sign extension for values in instructions. This circuit supports extending an 8-bit value to 16 bits or 32 bits, as well as extending a 16-bit value to 32 bits. This circuit will extend a value with zeros or with the sign, depending on the instruction. The schematic below shows one bit of this sign extension circuit. It consists of a latch on the left and right, with a multiplexer in the middle. The latches are constructed with a standard 386 circuit using a CMOS switch (see footnote).7 The multiplexer selects one of three values: the bit value from the swap network, 0 for sign extension, or 1 for sign extension. The multiplexer is constructed from a CMOS switch if the bit value is selected and two transistors for the 0 or 1 values. This circuit is replicated 32 times, although the bottom byte only has the latches, not the multiplexer, as sign extension does not modify the bottom byte. The sign extend circuit associated with bits 31-8 from the prefetcher. The second part of the sign extension circuitry determines if the bits should be filled with 0 or 1 and sends the control signals to the circuit above. The gates on the left determine if the sign extension bit should be a 0 or a 1. For a 16-bit sign extension, this bit comes from bit 15 of the data, while for an 8-bit sign extension, the bit comes from bit 7. The four gates on the right generate the signals to sign extend each bit, producing separate signals for the bit range 31-16 and the range 15-8. This circuit determines which bits should be filled with 0 or 1. The layout of this circuit on the die is somewhat unusual. Most of the prefetcher circuitry consists of 32 identical columns, one for each bit.8 The circuitry above is implemented once, using about 16 gates (buffers and inverters are not shown above). Despite this, the circuitry above is crammed into bit positions 17 through 7, creating irregularities in the layout. Moreover, the implementation of the circuitry in silicon is unusual compared to the rest of the 386. Most of the 386's circuitry uses the two metal layers for interconnection, minimizing the use of polysilicon wiring. However, the circuit above also uses long stretches of polysilicon to connect the gates. Layout of the sign extension circuitry. This circuitry is at the bottom of the prefetch queue. The diagram above shows the irregular layout of the sign extension circuitry amid the regular datapath circuitry that is 32 bits wide. The sign extension circuitry is shown in green; this is the circuitry described at the top of this section, repeated for each bit 31-8. The circuitry for bits 15-8 has been shifted upward, perhaps to make room for the sign extension control circuitry, indicated in red. Note that the layout of the control circuitry is completely irregular, since there is one copy of the circuitry and it has no internal structure. One consequence of this layout is the wasted space to the left and right of this circuitry block, the tan regions with no circuitry except vertical metal lines passing through. At the far right, a block of circuitry to control the latches has been wedged under bit 0. Intel's designers go to great effort to minimize the size of the processor die since a smaller die saves substantial money. This layout must have been the most efficient they could manage, but I find it aesthetically displeasing compared to the regularity of the rest of the datapath. How instructions flow through the chip Instructions follow a tortuous path through the 386 chip. First, the Bus Interface Unit in the upper right corner reads instructions from memory and sends them over a 32-bit bus (blue) to the prefetch unit. The prefetch unit stores the instructions in the 16-byte prefetch queue. Instructions follow a twisting path to and from the prefetch queue. How is an instruction executed from the prefetch queue? It turns out that there are two distinct paths. Suppose you're executing an instruction to add 12345678 to the EAX register. The prefetch queue will hold the five bytes 05 (the opcode), 78, 56, 34, and 12. The prefetch queue provides opcodes to the decoder one byte at a time over the 8-bit bus shown in red. The bus takes the lowest 8 bits from the prefetch queue's alignment network and sends this byte to a buffer (the small square at the head of the red arrow). From there, the opcode travels to the instruction decoder.10 The instruction decoder, in turn, uses large tables (PLAs) to convert the x86 instruction into a 111-bit internal format with 19 different fields.11 The data bytes of an instruction, on the other hand, go from the prefetch queue to the ALU (Arithmetic Logic Unit) through a 32-bit data bus (orange). Unlike the previous buses, this data bus is spread out, with one wire through each column of the datapath. This bus extends through the entire datapath so values can also be stored into registers. For instance, the MOV (move) instruction can store a value from an instruction (an "immediate" value) into a register. Conclusions The 386's prefetch queue contains about 7400 transistors, more than an Intel 8080 processor. (And this is just the queue itself; I'm ignoring the prefetch control logic.) This illustrates the rapid advance of processor technology: part of one functional unit in the 386 contains more transistors than an entire 8080 processor from 11 years earlier. And this unit is less than 3% of the entire 386 processor. Every time I look at an x86 circuit, I see the complexity required to support backward compatibility, and I gain more understanding of why RISC became popular. The prefetcher is no exception. Much of the complexity is due to the 386's support for unaligned memory accesses, requiring a byte shift network to move bytes into 32-bit alignment. Moreover, at the other end of the instruction bus is the complicated instruction decoder that decodes intricate x86 instructions. Decoding RISC instructions is much easier. In any case, I hope you've found this look at the prefetch circuitry interesting. I plan to write more about the 386, so follow me on Bluesky (@righto.com) or RSS for updates. I've written multiple articles on the 386 previously; a good place to start might be my survey of the 368 dies. Footnotes and references The width of the circuitry for one bit changes a few times: while the prefetch queue and segment descriptor cache use a circuit that is 66 µm wide, the datapath circuitry is a bit tighter at 60 µm. The barrel shifter is even narrower at 54.5 µm per bit. Connecting circuits with different widths wastes space, since the wiring to connect the bits requires horizontal segments to adjust the spacing. But it also wastes space to use widths that are wider than needed. Thus, changes in the spacing are rare, where the tradeoffs make it worthwhile. ↩ The Intel 8086 processor had a six-byte prefetch queue, while the Intel 8088 (used in the original IBM PC) had a prefetch queue of just four bytes. In comparison, the 16-byte queue of the 386 seems luxurious. (Some 386 processors, however, are said to only use 12 bytes due to a bug.) The prefetch queue assumes instructions are executed in linear order, so it doesn't help with branches or loops. If the processor encounters a branch, the prefetch queue is discarded. (In contrast, a modern cache will work even if execution jumps around.) Moreover, the prefetch queue doesn't handle self-modifying code. (It used to be common for code to change itself while executing to squeeze out extra performance.) By loading code into the prefetch queue and then modifying instructions, you could determine the size of the prefetch queue: if the old instruction was executed, it must be in the prefetch queue, but if the modified instruction was executed, it must be outside the prefetch queue. Starting with the Pentium Pro, x86 processors flush the prefetch queue if a write modifies a prefetched instruction. ↩ The prefetch unit generates "linear" addresses that must be translated to physical addresses by the paging unit (ref). ↩ I don't know which phase of the clock is phase 1 and which is phase 2, so I've assigned the numbers arbitrarily. The 386 creates four clock signals internally from a clock input CLK2 that runs at twice the processor's clock speed. The 386 generates a two-phase clock with non-overlapping phases. That is, there is a small gap between when the first phase is high and when the second phase is high. The 386's circuitry is controlled by the clock, with alternate blocks controlled by alternate phases. Since the clock phases don't overlap, this ensures that logic blocks are activated in sequence, allowing the orderly flow of data. But because the 386 uses CMOS, it also needs active-low clocks for the PMOS transistors. You might think that you could simply use the phase 1 clock as the active-low phase 2 clock and vice versa. The problem is that these clock phases overlap when used as active-low; there are times when both clock signals are low. Thus, the two clock phases must be explicitly inverted to produce the two active-low clock phases. I described the 386's clock generation circuitry in detail in this article. ↩ The Manchester carry chain is typically used in an adder, which makes it more complicated than shown here. In particular, a new carry can be generated when two 1 bits are added. Since we're looking at an incrementer, this case can be ignored. The Manchester carry chain was first described in Parallel addition in digital computers: a new fast ‘carry’ circuit. It was developed at the University of Manchester in 1959 and used in the Atlas supercomputer. ↩ For some reason, the incrementer uses a completely different XOR circuit from the comparator, built from a multiplexer instead of logic. In the circuit below, the two CMOS switches form a multiplexer: if the first input is 1, the top switch turns on, while if the first input is a 0, the bottom switch turns on. Thus, if the first input is a 1, the second input passes through and then is inverted to form the output. But if the first input is a 0, the second input is inverted before the switch and then is inverted again to form the output. Thus, the second input is inverted if the first input is 1, which is a description of XOR. The implementation of an XOR gate in the incrementer. I don't see any clear reason why two different XOR circuits were used in different parts of the prefetcher. Perhaps the available space for the layout made a difference. Or maybe the different circuits have different timing or output current characteristics. Or it could just be the personal preference of the designers. ↩ The latch circuit is based on a CMOS switch (or transmission gate) and a weak inverter. Normally, the inverter loop holds the bit. However, if the CMOS switch is enabled, its output overpowers the signal from the weak inverter, forcing the inverter loop into the desired state. The CMOS switch consists of an NMOS transistor and a PMOS transistor in parallel. By setting the top control input high and the bottom control input low, both transistors turn on, allowing the signal to pass through the switch. Conversely, by setting the top input low and the bottom input high, both transistors turn off, blocking the signal. CMOS switches are used extensively in the 386, to form multiplexers, create latches, and implement XOR. ↩ Most of the 386's control circuitry is to the right of the datapath, rather than awkwardly wedged into the datapath. So why is this circuit different? My hypothesis is that since the circuit needs the values of bit 15 and bit 7, it made sense to put the circuitry next to bits 15 and 7; if this control circuitry were off to the right, long wires would need to run from bits 15 and 7 to the circuitry. ↩ In case this post is getting tedious, I'll provide a lighter footnote on sign extension. The obvious mnemonic for a sign extension instruction is SEX, but that mnemonic was too risque for Intel. The Motorola 6809 processor (1978) used this mnemonic, as did the related 68HC12 microcontroller (1996). However, Steve Morse, architect of the 8086, stated that the sign extension instructions on the 8086 were initially named SEX but were renamed before release to the more conservative CBW and CWD (Convert Byte to Word and Convert Word to Double word). The DEC PDP-11 was a bit contradictory. It has a sign extend instruction with the mnemonic SXT; the Jargon File claims that DEC engineers almost got SEX as the assembler mnemonic, but marketing forced the change. On the other hand, SEX was the official abbreviation for Sign Extend (see PDP-11 Conventions Manual, PDP-11 Paper Tape Software Handbook) and SEX was used in the microcode for sign extend. RCA's CDP1802 processor (1976) may have been the first with a SEX instruction, using the mnemonic SEX for the unrelated Set X instruction. See also this Retrocomputing Stack Exchange page. ↩ It seems inconvenient to send instructions all the way across the chip from the Bus Interface Unit to the prefetch queue and then back across to the chip to the instruction decoder, which is next to the Bus Interface Unit. But this was probably the best alternative for the layout, since you can't put everything close to everything. The 32-bit datapath circuitry is on the left, organized into 32 columns. It would be nice to put the Bus Interface Unit other there too, but there isn't room, so you end up with the wide 32-bit data bus going across the chip. Sending instruction bytes across the chip is less of an impact, since the instruction bus is just 8 bits wide. ↩ See "Performance Optimizations of the 80386", Slager, Oct 1986, in Proceedings of ICCD, pages 165-168. ↩

yesterday 4 votes
Code Matters

It looks like the code that the newly announced Figma Sites is producing isn’t the best. There are some cool Figma-to-WordPress workflows; I hope Sites gets more people exploring those options.

2 days ago 5 votes
What got you here…

John Siracusa: Apple Turnover From virtue comes money, and all other good things. This idea rings in my head whenever I think about Apple. It’s the most succinct explanation of what pulled Apple from the brink of bankruptcy in the 1990s to its astronomical success today. Don’

2 days ago 3 votes