Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
8
a late prototype Macintosh Portable. But it turns out it's not merely notable for what it is than what it has on it: a beta version of System 6.0.6 (the doomed release that Apple pulled due to bugs), Apple sales databases, two online services — the maligned Mac Prodigy client, along with classic AppleLink as used by Apple staff — and two presentations, one on Apple's current Macintosh line and one on the upcoming System 7. Now that I've got the infamous Conner hard drive it came with safely copied over, it's time to explore its contents some more. opposite extreme.) It has an Androda 7MB expansion card to give it 8MB of RAM, just shy of the 9MB maximum it was capable of addressing, though no card sold for the Portable back in the day could achieve this and even the 9MB RAM cap is problematic. This is possibly due to the lowest motherboard 1MB being shared between ROM and RAM (ROM overlays it at reset to allow the machine to start even with RAM in a non-deterministic state), as...
a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Old Vintage Computing Research

What went wrong with wireless USB

(Hat tip to the late Bill Strauss and The Capitol Steps' Lirty Dies.) take my Palm OS Fossil Wrist PDA smartwatch mobile. It has no on-board networking libraries but can be coerced into doing PPP over its serial port (via USB) by using the libraries from my Palm m505. Of course, that then requires it be constantly connected to a USB port, which is rather inconvenient for a wristwatch. But what if the USB connection could be made wirelessly? For a few years, real honest-to-goodness wireless USB devices were actually a thing. Competing standards led to market fracture and the technologies fizzled out relatively quickly in the marketplace, but like the parallel universe of FireWire hubs there was another parallel world of wireless USB devices, at least for a few years. As it happens, we now have a couple of them here, so it's worth exploring what wireless USB was and what happened to it, how the competing standards worked (and how well), and if it would have helped. for the iBook G3 in 1999 people really started to believe a completely wireless future was possible — for any device. This was nevertheless another type of network, just one involving only one computer and one user over a short range, which was grandiosely dubbed the "personal area network," or PAN, or WPAN, depending on executive and blood alcohol level. Although initial forms of Bluetooth were the first to arrive in this space, Bluetooth was never intended to handle the very high data rates that some wireless peripherals might require, and even modern high-speed Bluetooth isn't specced beyond 50 megabits/sec (though hold that thought for a later digression). The key basis technology instead was the concept of ultra wide-band, or UWB, which in modern parlance collectively refers to technologies allowing very weak, very wide-spectrum (in excess of 500MHz) signals to become a short range yet high bandwidth communications channel. Wideband, in this case, is contrasted against the more typical narrowband. In general radio transmission works by modulating a carrier wave of a specified frequency, changing its amplitude (AM), phase, and/or the frequency itself (FM), to encode a signal to be communicated. For terrestrial analogue broadcasting, a good example of narrowband radio, this might be an audio signal carrying some specified frequency range; for FM radio in the United States this audio signal ranges from 30Hz to 15kHz, enough to capture much of the human-audible range, plus various higher frequencies not intended for listening. This collective signal effectively becomes encoded into sidebands on one or both sides of the carrier frequency (even with AM), and per Carson's rule the higher the maximum modulated frequency of the encoded signal then the larger the sidebands (ergo, its bandwidth) must be. As a result, commercial radio stations in particular are often heavily filtered for coexistence to allow many stations to share the band: in the United States, within ITU Region 2, the Federal Communications Commission (FCC) divides the FM band from 88.0MHz to 108.0MHz into 100 "channels" of 200kHz each, putting the nominal carrier frequency in the middle of the channel to provide sufficient sideband width for modulation, and strictly regulates any spillover outside those channel boundaries. In practice, most adjacent U.S. FM stations are no closer than 400kHz, a balance between spectrum capacity and signal strength. This typically permits a maximum FM stereo modulated frequency of about 53kHz; frequencies in the aggregate range being transmitted that are unused or unnecessary can be repurposed as subcarriers to emit additional information, such as FM stereo's 19kHz pilot tone subcarrier used to signal receivers, or Microsoft's brief flirtation with one-way transmissions to SPOT smartwatches. Doing so is "free" because the subcarrier frequency is already part of the frequency range. By contrast, signals like 802.11 Wi-Fi are wideband radio, or at least comparatively wid-er band, because they pass much higher bandwidths. Although 802.11 frequencies (except for the very highest 45/60GHz band) are generally divided into 5MHz channels, people typically only use channels 1, 6 and 11 with 2.4GHz Wi-Fi, for example (or, in later standards, 1, 5, 9 and maybe 13), which spaces them by 20MHz or more. Compare this with medium-wave AM radio, where channel spacing in the United States is just 10kHz and even 9kHz in some countries like Australia, or shortwave radio with only 5kHz spacing. impulse radar, which is a form of the more familiar pulsed radar (such as the traffic cop at the corner) using much briefer radio pulses. Radar also works on a carrier wave model, but instead of FM or AM, the radar carrier wave is merely pulsed on and off. This is necessary so that the detector during the "off" phase can pick up echoes of the radio pulse transmitted during the "on" phase, and for most applications of radar, the pulse-repetition frequency (PRF) is much less than the frequency of the carrier wave being pulsed. Shorter, more frequent pulses would have theoretically yielded greater precision at close range, but such capability was beyond the electronics available to early radar researchers who were more concerned with long-range detection anyway, where the off phase had to be of sufficient length to detect a distant reflection. By the 1970s, however, the technology had sufficiently advanced that the radar's PRF could approach its carrier frequency, making things like ground-penetrating radar possible. While higher frequencies couldn't travel through ground for great distances, they did yield much better resolution and therefore meaningful data. To a basic approximation UWB uses the same principle as impulse radar: a series of pulses, potentially as short as picoseconds long, of a particular carrier wave. As the carrier wave itself isn't changing, all of the information is necessarily being encoded in the pulses' timing. Being discontinuous waves, Carson's rule doesn't directly apply to most forms of UWB, but the analogous Shannon capacity limit indicates that rapid modulation from a high PRF would also require significant bandwidth — hence, ultra wide-band. To mitigate UWB transmissions from interfering with narrower-band transmissions on the same frequencies, the pulsed transmissions can be made at very low power, often below the typical noise floor for other transmissions. Naturally this also necessarily limits its range to perhaps a hundred or so metres at most but also makes battery-powered operation highly practical. Its utility in location-finding is because time-of-flight can be measured very quickly and exactly due to the short pulse lengths; when fully active, an Apple AirTag typically transmits a pulse about every other nanosecond. subsequent amendments. A standards group quickly emerged at the IEEE called the IEEE 802.15 Working Group for WPANs, addressing not only UWB but other WPAN-enabling technologies generally. The 802.15 WG had two arms, 802.15.4 for low bandwidth applications which we will not discuss further in this article (Zigbee is probably the most well-known in this category), and 802.15.3 for high bandwidth applications. Subsequently, the WiMedia Alliance was established that summer to capitalize on the new high-bandwidth technology, counting among other early members Eastman Kodak, Motorola, Hewlett-Packard, Intel, Philips, Samsung, Sharp and STMicroelectronics. 802.15.3 had obvious utility in determining precise location, but an extension called 802.15.3a in December 2002 sought to further enhance the standard for high-speed transmission of image and multimedia data. This team started with 23 proposals and whittled them down to two, DS-UWB (alternatively DS-CDMA) and MB-OFDM. DS-UWB stands for direct sequence ultra wide-band, where the data is simply sent as pulses (as in binary pulse AM) over the entire frequency range in use. However, although the low power of UWB prevents it from interfering with higher-power narrowband signals, an additional layer is needed to prevent UWB transmissions from interfering with each other (i.e., multiple access). DS-UWB uses a system similar to cellular CDMA (code-division multiple access) where each transmitter modulates the data signal with an even higher frequency pseudorandom code known to the receiver, hence its alternative name of DS-CDMA. An interfering transmitter without the same code will have its signals attenuated during the decoding (despreading) process and be ignored. Additionally, by making the transmitted signal require more bandwidth than the original one, the composite signal becomes spread over an even larger frequency range and thus more resistant to narrow-band interference (i.e., direct sequence spread spectrum). On the other hand, MB-OFDM (multiband orthogonal frequency-division multiplexing) instead employs a massive number of subcarriers to send its data. The basic principle of OFDM, which dates back to Bell Labs in 1966, is to divide up the desired digital signal into multiple bits transmitted in parallel on multiple simultaneous subcarriers. OFDM has many current applications, among others various standards for Wi-Fi and digital TV such as DVB-T and 802.11a/g/n/ac/ah. To avoid interference between the subcarriers yet maximize the channel's capacity, each subcarrier is separated by a minimum frequency distance usually computed as the reciprocal of the useful symbol duration, making each subcarrier orthogonal to the others surrounding it and easily discriminated. MB-OFDM as used here divides the approved range into fourteen 528MHz subbands of 128 subcarriers each, 100 of which are used for data transmission and the remainder for zeroes, guard tones and pilot signals. It solves the multiple access problem by hopping between transmission subfrequencies in a defined pattern (time-frequency coding), meaning each user ideally is on a different one at any given time while also avoiding narrowband intrusion on any one particular frequency. In practice the spec doesn't use all of the subbands simultaneously, bundling them into four bandgroups of three (with a fifth group of two, and a sixth group overlapping two others) and selecting a group as required by local regulation or to compensate for existing sources of interference. MultiBand OFDM Alliance in June 2003, founded by Texas Instruments and crucially joined by Intel, while the DS-UWB camp was largely led by Motorola and subsequently Freescale, its inheritor, who had significant investment in CDMA. Although MB-OFDM demanded obviously greater technical complexity, it also presented the prospect of much faster data rates, and as a result the MBOA continued to accrete members despite Motorola's protests. Motorola attempted to develop a compromise lower-speed Common Signaling Mode ("UWB CSM") so DS-UWB and MB-OFDM devices could coexist, but the process descended into squabble, and Motorola pulled out of the WiMedia Alliance to establish the competing UWB Forum in 2004 exclusively focused on DS-UWB with CSM. As the standards argument raged in the background, OEMs meanwhile started evaluating potential market applications. After all, just about any potential short-range interconnect could be built on it; proposals to replace or reimplement Bluetooth with UWB were considered, as well as transports for IPv4 networking and FireWire (IEEE 1394). The original wireless USB concept in particular came from Motorola's new spinoff Freescale, who was determined to win the war anyway by getting their chipset to retail first, but also from Intel, who through its heavy influence on the USB Implementers Forum (USB-IF) persuaded the organization to adopt WiMedia's version of MB-OFDM as their officially blessed USB solution for high-speed wireless devices. In February 2004 Intel announced the formation of the Wireless USB (W-USB) Promoter Group, composed of themselves, Agere (now part of Broadcom via the former LSI Logic), Hewlett-Packard, Microsoft, NEC, Philips and Samsung, with an aim for products within the next year. Because the W-USB name clashed with Freescale's initial branding, Intel and the USB-IF eventually settled on CW-USB ("Certified Wireless USB") and the MBOA was merged into the WiMedia Alliance in 2005. Now that the attempt to make an IEEE standard had clearly stalled for good, WiMedia submitted its own specification to Ecma instead, published as ECMA-368, and the 802.15.3a Task Group subsequently disbanded in January 2006. Both Freescale W-USB (later changed to Cord-Free USB and then Cable-Free USB, both of which we'll call CF-USB for short) and Intel CW-USB conceptually replicate the host-centric nature of USB 2.0, hewing more or less to the same basic topology but obviously without wires. Both systems supported up to 127 devices and necessarily the over-the-air connection was encrypted, both with AES-128. There were of course no compliant devices yet, nor compliant computers, so both competing standards required a dongle on the PC side and offered wireless USB hubs to connect existing peripherals. The main user-facing difference between Cable-Free and Certified Wireless USB was that CF-USB was intentionally, and in this case ironically, much closer to the wired USB spec. In particular, although CF-USB connections could only go point-to-point — just like a single cord — all USB features and transfer types were supported, even isochronous transfers for real-time data. CF-USB also had the compatibility edge in that the other end would look just like a regular USB hub to the computer, so no software update was necessary. CW-USB, on the other hand, although its virtual bus was much more flexible and devices could be hosts to other devices, wasn't fully backwards-compatible with all USB devices and needed new drivers and operating system support. non-UWB wireless USB system that I'll come back to later on. Freescale's team eventually suffered management departures and failed to release any future CF-USB hardware, after which the UWB Forum itself imploded in 2007. Using a USB PC dongle made by Taiwanese OEM Gemtek, exhibitors were shown a PC and a digital camera associating with each other and the PC downloading images from the camera to its desktop, which Intel claimed could run at up to USB 2.0's full 480Mb/s at three metres (110Mb/sec up to 10). One heavily anticipated application was as a docking station you could just walk up to: if you had been previously associated, then boom, you were connected. The bandwidth, Intel promised, would be real and it would be spectacular. A few months later, Belkin's reworked dongle-hub kit — initially still called "Cable-Free" until Freescale objected — finally emerged for retail sale in 2007. Unfortunately, the chipset switch eliminated Belkin's Mac compatibility and it only came with Windows drivers. Worse, Belkin's hub took it on the chin in various reviews, citing an eighty percent reduction in throughput if the devices were just a foot away, and another 30% on top of that at four feet, with a maximum range of somewhere around six (or one big wall). This probably made it more secure, but definitely not more convenient, and far short of the claimed 10 metre maximum range. It doesn't look like Belkin sold very many. Another vendor was D-Link, who produced both dongles and hubs along with a starter kit containing one of each. This NOS example, utterly unused in a sealed box, had an original MSRP of about $170 ($225 in 2025 dollars) but showed up on eBay for $12. I couldn't resist picking it up and a couple other cheap CW-USB products to play with, all of which carried the proud and official Certified Wireless USB logo. I made sure one of them was a docking station since that was intended to be the killer app. all of them come with an HWA, even though only one of them actually (or at least officially) has a DWA hub: the D-Link starter kit (model DUB-9240), consisting of a DUB-2240 4-port DWA USB 2.0 hub and a DUB-1210 HWA. The TRULink #29596 Wireless USB to VGA and Audio Kit has two downstream devices with on-board DWAs, one for a VGA monitor (up to 1600x1200 or 1680x1050) and one for analogue audio, plus its own HWA; the Atlona AT-PCLink Wireless USB DisplayDock offers DVI video, 3.5mm (1/8") audio and two USB ports, advertised for your mouse and keyboard (but really a lurking hub also). The dock base, interestingly enough, is not a CW-USB device itself: you have to plug a DWA into it (included) which can go in one of two ports depending on physical configuration. In the package Atlona also includes another HWA. However, since they're all allegedly CW-USB 1.0 compliant, you should be able to use any HWA you want. (Theoretically. That's called "foreshadowing.") The D-Link and TRULink HWAs only support Windows XP SP3 and Vista — there was a short-lived Linux implementation that Intel themselves wrote, but it was very incomplete and eventually removed — and the Atlona HWA does too, but it also claims support for Windows 7 and even Mac OS X (Leopard and Snow Leopard). So our test system will be ... Virus Alert by Weird Al. Get out your Intel if you want to try this.) That covers the HWA and the HWA-DWA link, but being "USB" (after a fashion) you also need a driver for the device it's connecting to. Fortunately the TRULink and Atlona video systems are both DisplayLink-based, supporting screen mirroring and spanning, for which (Intel) Mac drivers also exist. does later on.) association process, which is necessary because obviously you don't want malicious USB devices trying to talk to you, and you don't want your next-door neighbour possibly being able to use your printer or read tax returns on your thumb drive. (I didn't know that was deductable!) The process of association generates a new AES-128 session key and records both 128-bit host and device IDs for future recognition. This shared 384-bit association context remains in effect until explicitly disabled: the associated device now won't interact with HWAs it doesn't know, other than to potentially associate with them also, and the HWA will only talk to devices with which it has been associated. It is possible, and absolutely supported, for a device to be associated with multiple HWAs. Association in CW-USB can be done one of three ways, either by factory pre-association (the TRULink and Atlona devices come pre-associated with their HWAs, for example), numeric association where the device provides an on-screen code (like Bluetooth pairing) or the PIN on the underside of the device can be manually entered (an alpha-numeric code like D0NTH4CKM3), or, uniquely, by cable association. physically connect the CW-USB device to your PC or Mac via USB cable, let it be recognized by the HWA's driver, and then disconnect it. It then continues to act connected, just via the HWA. The D-Link DWA-hub is cable-associated as part of the installation process, or can be associated by PIN; it is the only one of these three that is not pre-associated. All devices support pre-association and some sort of numeric association, but a physical USB port is naturally required for cable association. It is nevertheless the most secure of the three methods, first because you have to have physical custody of both the device and the computer, second because it's a new and unique key, and third because key creation and distribution occurs entirely via the cable and never over the air. Unfortunately it's not possible to blacklist the other association methods, so you'd better not let your neighbour get your PIN. (You pay how much in mortgage interest??) Some devices like the TRUlinks mercifully do support changing it, but that ability didn't seem universal in the devices I looked at. In this case, all three devices support cable-association. The D-Link hub and the TRULink devices do so via their USB ports, but the Atlona dock does it by plugging the DWA into the computer instead of the docking base itself. The reverse process is also obviously possible to de-associate a device, and you can outright block devices as well, though this may require some fiddling if they were pre-associated. Similarly, most devices, including this one, have a reset button which will clear the association context(s) stored in them, removing any undesired linkages. Let's get the D-Link kit installed in the Vista VM. not big the wireless USB market ultimately got. A few pieces of WiQuest and their IP are now part of modern Staccato Communications. not to plug in either the dongle or the hub until installation is complete and the hub has been cable-associated. can see it. Although I found it initially surprising that VMware didn't ask me about the device when I connected it, upon reflection it's perfectly logical that wireless USB devices wouldn't be seen at all by the Mac because we've effectively constructed a new and separate USB bus completely outside of it. For all the devices I connected to the hub-DWA, VMware was absolutely unaware of all of them, including the hub itself; the only device the MacBook and therefore VMware saw was the HWA. This is probably good from a performance view though possibly bad from a device control view. COM3: we just installed (the others are provided by VMware). emulate a serial port but only appear on the USB bus when there is activity, such as kicking off a HotSync. Notice the only thing connected to the MacBook is the HWA (all ports are on the left side of this model), but with the watch connected to the hub-DWA the Vista VM sees the new USB device appear. does appear to work suggests this should work on, say, Windows XP. Let's see if the Atlona AT-PCLink natively in Snow Leopard can do any better. It's time to bring on the docking station. .dmg double the size it needed to be. A help alias points to a small HTML-based manual, but the Windows version has a full PDF available on the disc. and had a Universal payload, even though Atlona said explicitly it wasn't compatible with PowerPC. I grabbed my great big A1139 17" DLSD PowerBook G4 1.67GHz, the last and mightiest PowerBook which I use as a portable DVD player and Leopard 10.5.8 test system, to see if it would work. this way? pilot-xfer, but that doesn't give you the rest of the PIM, and Mark/Space's Missing Sync does not run on Mavericks or later.) Fortunately, this is Snow Leopard, so we have O.G. Rosetta. associate or pair a new device. (Remember what I said about foreshadowing?) This restriction appears to be entirely due to the software and isn't unique to the Mac version; the Atlona manual indicates that the Windows version can't associate new devices either, other than possibly another Atlona dock. Officially it can be re-paired with its original DWA and that's it. /System/Library/WUSB/CBA.app/Contents/Resources/DB.plist, which lists associated devices. (The very location of this .plist again suggests it wasn't intended for user modification.) Here is the relevant portion, with keys suppressed: You can identify the WiMedia bandgroup (1, 3.1GHz to 4.8GHz), the 128-bit host and device IDs, and the 384-bit association context (which includes both IDs) in the key-value pairs. Yes, I could insert another device entry easily enough, but I wouldn't know the AES key the other end is using, so I couldn't compute a valid context. Since this driver is running natively and we're not paying a VM tax, let's see how well video streaming worked since oodles of cableless bandwidth was just about the entire use case for wireless USB. Snow Leopard welcome video and prove it wrong. (The Snow Leopard welcome video has audio in a separate file, so this comparison movie has no sound.) 2.0 Extender, used a much more familiar wireless transport: 802.11g. Yup — it's USB over Wi-Fi. bulk transfers, and I tend to believe Icron because it's their hardware — but both are clear that high-bandwidth devices like UVC webcams are going to have a bad time: "Icron Technologies Corporation does not guarantee that all USB devices are compatible with the WiRanger and only recommends the product be used with keyboard, mouse, and some flash drives." some of my devices use. bus like CW-USB. I wanted to do some performance tests with it, but strangely macOS Sequoia will not recognize the sender when connected to my M1 MacBook Air, even though it worked fine connected to the Raptor POWER9 in Fedora 41 and was seen as a hub there too. So we'll do the tests on the MacBook as well, which also had no problem seeing and using the pair. Again, we'll just copy that same 1.09GB combo installer and see how long it takes. pilot-xfer from the command line instead of Palm Desktop. Slirp or usb2ppp. as before. It actually doesn't feel much different in terms of speed from a direct connect and I didn't find it particularly unstable. Of what we've explored here, the Gefen box seems the least complicated solution, though the receiver pulled a bit more battery power and of course you'd need a host around to connect through. As such I'm still using the "Raspberry Pi in a camera bag" notion for the time being, but it's nice to see it can work other ways. My experience with the Gefen/Icron extender was generally consistent with other reviews that found it adequate for undemanding tasks like printing. However, it doesn't look like either Icron or Gefen sold many of them, likely due to their unattractive price tag. Icron did announce plans for a much faster wireless USB solution on the 60GHz band using 802.11ad, which with its 7 Gbit/s capacity would easily handle USB 2.0 and even 5 Gbit/s 3.0, but it doesn't seem like the device was ever offered for sale. (A couple people have mentioned to me that there were other ≥802.11ad wireless USB products out there, including WiGig. Their bandwidth was reportedly more than sufficient for video but I don't have any of those devices here, and they are better understood as Wi-Fi routers that also do USB device sharing.) Although Icron still sells extenders, now as a division of Analog Devices, all of their current products are wired-only. As for CW-USB, by 2008 few laptops on the market offered the feature, and even for those that did like the Lenovo ThinkPad X200, it was always an extra option. That meant most computers that wanted to connect to a CW-USB device still needed a HWA-dongle, so they still took up a port, and HWAs never got produced in enough numbers to be cheap. On top of that, it further didn't help matters that anything close to the promised maximum bandwidth was hardly ever observed in real-world situations. Device makers, meanwhile, mostly chose to wait out greater availability of CW-USB capable computers, and since that never happened, neither a large number of computers with built-in CW-USB nor CW-USB devices were ever made before the standard was abandoned. The last stand of UWB as a device interconnect was ironically as a proposal for Bluetooth 3.0+HS, the 2009 optional high-data-rate specification. 3.0+HS introduced AMP (Alternative MAC/PHY) as a bolt-on method, in which low-speed Bluetooth would be used to set up a link and then the high-speed data exchange would occur over the second transport, originally MB-OFDM. With CW-USB fading from the market, however, the WiMedia Alliance closed its doors in 2009 and shed its existing work to the USB-IF, the W-USB Promoter Group and the Bluetooth SIG. This move was controversial with some WiMedia members, who consequently refused to grant access to their intellectual property to the new successor groups, and instead AMP ended up being based on 802.11 as well. The AMP extension was little used and eventually removed in Bluetooth 5.3. Is there a moral to this story? I'm not quite certain. As so often happens, the best technology didn't win; in my eyes CF-USB had the potential for being more widely adopted because of its simplicity and compatibility, but was ruined when Freescale got greedy and it never recovered. That said, the real question is whether wireless USB itself, with all of its broken promises, was the right approach for the WPAN concept. It's certainly not an indictment of ultra wideband, which is used today more than ever before: many chips are still produced that implement it, the best known undoubtedly being Apple's U1 and U2 chips in iOS and iOS-adjacent devices like the AirTag, and such chips continue to be widely used for things such as precise location fixing and local interactions. UWB has also been used for diverse tasks like tracking NFL players during games or parts during factory assembly, and for autonomous vehicles in particular it's extremely useful. without wireless USB — and we just never noticed. Maybe that's the moral.

a week ago 2 votes
A few retrobits updates on Floodgap

Just a brief programming note. Before this blog there was Floodgap Retrobits, and I still maintain those pages. One of the earliest was my Tomy Tutor-specific page devoted to my very first computer which we got in 1983. Relatives of the Texas Instruments home computers and closely patterned after the unreleased TI 99/8, the history of the Japanese models is relatively well-known and there are a number of Japanese enthusiasts that specialize in the Pyuuta, the Tutor's ancestor system. On the other hand, hardly anybody knows anything about the British version. That system is the Grandstand Tutor: Your Computer October 1983, at that time one of the major home computer magazines in the UK. The original form of this machine was very similar to the Pyuuta, which emphasized its TMS9918A-based built-in paint program and due to its specialization on graphics only implements a very simplified animation-oriented dialect of BASIC called GBASIC. Adam Imports rebadged many Asian and some American toys and games for the UK (and, for some period of time, New Zealand) and had a particularly close relationship with Tomy. In fact, the relationship was close enough that Adam in fact rejected this initial version as uncompetitive with other home computers and sent it back to Tomy for a more upgraded BASIC. Tomy provided this by modifying TI Extended BASIC, calling it Tomy BASIC and implementing it as a second mode accessible from the system's menu-based interface. The absence of Tomy BASIC places the earliest Grandstand Tutor prior to the American Tomy Tutor, which also has the upgraded Tomy BASIC. Tomy subsequently sold this upgrade in at least two forms as an option for the Japanese machines as well. The interesting part is that while PAL Tutors have been documented to exist (the American Tutor is obviously NTSC), no one yet has reported finding a Grandstand. It wouldn't be hard to distinguish one — the photograph has obvious Grandstand branding on its silver badge. It's possible they were never released at all because even accounting for publishing delays, the second Grandstand would have emerged late in 1983, hitting in the wake of the video game crash and against heavier hitters like the Commodore VIC-20 and Commodore 64 as well as (in the UK) the ZX Spectrum. Adam may have simply concluded it wasn't a strong enough competitor even with the upgraded BASIC to sell. including SunView. Later their IDT workstations, though uniprocessor, competed directly with and even could squeak by contemporary SPARCstations, at least in the beginning. Solbourne eventually ran out of money when they hit engineering limits with their own CPU and could never reclaim the throughput crown, abandoning the computer hardware market in 1994. We might be adding more remembrances as other Solbourne engineers are contacted. You can see these updates at The Little Orphan Tomy Tutor as well as past Old VCR Tomy articles, and The Solbourne Solace as well as past Old VCR Solbourne-specific articles. Naturally, if you have anything to add, feel free to post in the comments or drop me E-mail at ckaiser at floodgap dawt com.

2 weeks ago 16 votes
Let's give PRO/VENIX a barely adequate, pre-C89 TCP/IP stack (featuring Slirp-CK)

TCP/IP Illustrated (what would now be called the first edition prior to the 2011 update) for a hundred-odd bucks on sale which has now sat on my bookshelf, encased in its original shrinkwrap, for at least twenty years. It would be fun to put up the 4.4BSD data structures poster it came with but that would require opening it. Fortunately, today we have AI we have many more excellent and comprehensive documents on the subject, and more importantly, we've recently brought back up an oddball platform that doesn't have networking either: our DEC Professional 380 running the System V-based PRO/VENIX V2.0, which you met a couple articles back. The DEC Professionals are a notoriously incompatible member of the PDP-11 family and, short of DECnet (DECNA) support in its unique Professional Operating System, there's officially no other way you can get one on a network — let alone the modern Internet. Are we going to let that stop us? Crypto Ancienne proxy for TLS 1.3. And, as we'll discuss, if you can get this thing on the network, you can get almost anything on the network! Easily portable and painfully verbose source code is included. Recall from our lengthy history of DEC's early misadventures with personal computers that, in Digital's ill-advised plan to avoid the DEC Pros cannibalizing low-end sales from their categorical PDP-11 minicomputers, Digital's Small Systems Group deliberately made the DEC Professional series nearly totally incompatible despite the fact they used the same CPUs. In their initial roll-out strategy in 1982, the Pros (as well as their sibling systems, the Rainbow and the DECmate II) were only supposed to be mere desktop office computers — the fact the Pros were PDP-11s internally was mostly treated as an implementation detail. The idea backfired spectacularly against the IBM PC when the Pros and their promised office software failed to arrive on time and in 1984 DEC retooled around a new concept of explicitly selling the Pros as desktop PDP-11s. This required porting operating systems that PDP-11 minis typically ran: RSX-11M Plus was already there as the low-level layer of the Professional Operating System (P/OS), and DEC internally ported RT-11 (as PRO/RT-11) and COS. PDP-11s were also famous for running Unix and so DEC needed a Unix for the Pro as well, though eventually only one official option was ever available: a port of VenturCom's Venix based on V7 Unix and later System V Release 2.0 called PRO/VENIX. After the last article, I had the distinct pleasure of being contacted by Paul Kleppner, the company's first paid employee in 1981, who was part of the group at VenturCom that did the Pro port and stayed at the company until 1988. Venix was originally developed from V6 Unix on the PDP-11/23 incorporating Myron Zimmerman's real-time extensions to the kernel (such as semaphores and asynchronous I/O), then a postdoc in physics at MIT; Kleppner's father was the professor of the lab Zimmerman worked in. Zimmerman founded VenturCom in 1981 to capitalize on the emerging Unix market, becoming one of the earliest commercial Unix licensees. Venix-11 was subsequently based on the later V7 Unix, as was Venix/86, which was the first Unix on the IBM PC in January 1983 and was ported to the DEC Rainbow as Venix/86R. In addition to its real-time extensions and enhanced segmentation capability, critical for memory management in smaller 16-bit address spaces, it also included a full desktop graphics package. Notably, DEC themselves were also a Unix licensee through their Unix Engineering Group and already had an enhanced V7 Unix of their own running on the PDP-11, branded initially as V7M. Subsequently the UEG developed a port of 4.2BSD with some System V components for the VAX and planned to release it as Ultrix-32, simultaneously retconning V7M as Ultrix-11 even though it had little in common with the VAX release. Paul recalls that DEC did attempt a port of Ultrix-11 to the Pro 350 themselves but ran into intractable performance problems. By then the clock was ticking on the Pro relaunch and the issues with Ultrix-11 likely prompted DEC to look for alternatives. Crucially, Zimmerman had managed to upgrade Venix-11's kernel while still keeping it small, a vital aspect on his 11/23 which lacked split instruction and data addressing and would have had to page in and out a larger kernel otherwise. Moreover, the 11/23 used an F-11 CPU — the same CPU as the original Professional 350 and 325. DEC quickly commissioned VenturCom to port their own system over to the Pro, which Paul says was a real win for VenturCom, and the first release came out in July 1984 complete with its real-time features intact and graphics support for the Pro's bitmapped screen. It was upgraded ("PRO/VENIX Rev 2.0") in October 1984, adding support for the new top-of-the-line DEC Professional 380, and then switched to System V (SVR2) in July 1985 with PRO/VENIX V2.0. (For its part Ultrix-11 was released as such in 1984 as well, but never for the Pro series.) Keep that kernel version history in mind for when we get to oddiments of the C compiler. As for networking, though, with the exception of UUCP over serial, none of these early versions of Venix on either the PDP-11 or 8086 supported any kind of network connectivity out of the box — officially the only Pro operating system to support its Ethernet upgrade option was P/OS 2.0. Although all Pros have a 15-pin AUI network port, it isn't activated until an Ethernet CTI card is installed. (While Stan P. found mention of a third-party networking product called Fusion by Network Research Corporation which could run on PRO/VENIX, Paul's recollection is that this package ran into technical problems with kernel size during development. No examples of the PRO/VENIX version have so far been located and it may never have actually been released. You'll hear about it if a copy is found. The unofficial Pro 2.9BSD port also supports the network card, but that was always an under-the-table thing.) Since we run Venix on our Pro, that means currently our only realistic option to get this on the 'Nets is also over a serial port. lower speed port for our serial IP implementation. PRO/VENIX supports using only the RS-423 port as a remote terminal, and because it's twice as fast, it's more convenient for logins and file exchange over Kermit (which also has no TCP/IP overhead). Using the printer port also provides us with a nice challenge: if our stack works acceptably well at 4800bps, it should do even better at higher speeds if we port it elsewhere. On the Pro, we connect to our upstream host using a BCC05 cable (in the middle of this photograph), which terminates in a regular 25-pin RS-232 on the other end. Now for the software part. There are other small TCP/IP stacks, notably things like Adam Dunkel's lwIP and so on. But even SVR2 Venix is by present standards a old Unix with a much less extensive libc and more primitive C compiler — in a short while you'll see just how primitive — and relatively modern code like lwIP's would require a lot of porting. Ideally we'd like a very minimal, indeed barely adequate, stack that can do simple tasks and can be expressed in a fashion acceptable to a now antiquated compiler. Once we've written it, it would be nice if it were also easily portable to other very limited systems, even by directly translating it to assembly language if necessary. What we want this barebones stack to accomplish will inform its design: and the hardware 24-7 to make such a use case meaningful. The Ethernet option was reportedly competent at server tasks, but Ethernet has more bandwidth, and that card also has additional on-board hardware. Let's face the cold reality: as a server, we'd find interacting with it over the serial port unsatisfactory at best and we'd use up a lot of power and MTBF keeping it on more than we'd like to. Therefore, we really should optimize for the client case, which means we also only need to run the client when we're performing a network task. no remote login capacity, like, I dunno, a C64, the person on the console gets it all. Therefore, we really should optimize for the single user case, which means we can simplify our code substantially by merely dealing with sockets sequentially, one at a time, without having to worry about routing packets we get on the serial port to other tasks or multiplexing them. Doing so would require extra work for dual-socket protocols like FTP, but we're already going to use directly-attached Kermit for that, and if we really want file transfer over TCP/IP there are other choices. (On a larger antique system with multiple serial ports, we could consider a setup where each user uses a separate outgoing serial port as their own link, which would also work under this scheme.) Some of you may find this conflicts hard with your notion of what a "stack" should provide, but I also argue that the breadth of a full-service driver would be wasted on a limited configuration like this and be unnecessarily more complex to write and test. Worse, in many cases, is better, and I assert this particular case is one of them. Keeping the above in mind, what are appropriate client tasks for a microcomputer from 1984, now over 40 years old — even a fairly powerful one by the standards of the time — to do over a slow TCP/IP link? Crypto Ancienne's carl can serve as an HTTP-to-HTTPS proxy to handle the TLS part, if necessary.) We could use protocols like these to download and/or view files from systems that aren't directly connected, or to send and receive status information. One task that is also likely common is an interactive terminal connection (e.g., Telnet, rlogin) to another host. However, as a client this particular deployment is still likely to hit the same sorts of latency problems for the same reasons we would experience connecting to it as a server. These other tasks here are not highly sensitive to latency, require only a single "connection" and no multiplexing, and are simple protocols which are easy to implement. Let's call this feature set our minimum viable product. Because we're writing only for a couple of specific use cases, and to make them even more explicit and easy to translate, we're going to take the unusual approach of having each of these clients handle their own raw packets in a bytewise manner. For the actual serial link we're going to go even more barebones and use old-school RFC 1055 SLIP instead of PPP (uncompressed, too, not even Van Jacobson CSLIP). This is trivial to debug and straightforward to write, and if we do so in a relatively encapsulated fashion, we could consider swapping in CSLIP or PPP later on. A couple of utility functions will do the IP checksum algorithm and reading and writing the serial port, and DNS and some aspects of TCP also get their own utility subroutines, but otherwise all of the programs we will create will read and write their own network datagrams, using the SLIP code to send and receive over the wire. The C we will write will also be intentionally very constrained, using bytewise operations assuming nothing about endianness and using as little of the C standard library as possible. For types, you only need some sort of 32-bit long, which need not be native, an int of at least 16 bits, and a char type — which can be signed, and in fact has to be to run on earlier Venices (read on). You can run the entirety of the code with just malloc/free, read/write/open/close, strlen/strcat, sleep, rand/srand and time for the srand seed (and fprintf for printing debugging information, if desired). On a system with little or no operating system support, almost all of these primitive library functions are easy to write or simulate, and we won't even assume we're capable of non-blocking reads despite the fact Venix can do so. After all, from that which little is demanded, even less is expected. slattach which effectively makes a serial port directly into a network interface. Such an arrangement would be the most flexible approach from the user's perspective because you necessarily have a fixed, bindable external address, but obviously such a scheme didn't scale over time. With the proliferation of dialup Unix shell accounts in the late 1980s and early 1990s, closed-source tools like 1993's The Internet Adapter ("TIA") could provide the SLIP and later PPP link just by running them from a shell prompt. Because they synthesize artificial local IP addresses, sort of NAT before the concept explicitly existed, the architecture of such tools prevented directly creating listening sockets — though for some situations this could be considered a more of a feature than a bug. Any needed external ports could be proxied by the software anyway and later network clients tended not to require it, so for most tasks it was more than sufficient. Closed-source and proprietary SLIP/PPP-over-shell solutions like TIA were eventually displaced by open source alternatives, most notably SLiRP. SLiRP (hereafter Slirp so I don't gouge my eyes out) emerged in 1995 and used a similar architecture to TIA, handing out virtual addresses on an synthetic network and bridging that network to the Internet through the host system. It rapidly became the SLIP/PPP shell solution of choice, leading to its outright ban by some shell ISPs who claimed it violated their terms of service. As direct SLIP/PPP dialup became more common than shell accounts, during which time yours truly upgraded to a 56K Mac modem I still have around here somewhere, Slirp eventually became most useful for connecting small devices via their serial ports (PDAs and mobile phones especially, but really anything — subsets of Slirp are still used in emulators today like QEMU for a similar purpose) to a LAN. By a shocking and completely contrived coincidence, that's exactly what we'll be doing! Slirp has not been officially maintained since 2006. There is no package in Fedora, which is my usual desktop Linux, and the one in Debian reportedly has issues. A stack of patch sets circulated thereafter, but the planned 1.1 release never happened and other crippling bugs remain, some of which were addressed in other patches that don't seem to have made it into any release, source or otherwise. If you tried to build Slirp from source on a modern system and it just immediately exits, you got bit. I have incorporated those patches and a couple of my own to port naming and the configure script, plus some additional fixes, into an unofficial "Slirp-CK" which is on Github. It builds the same way as prior versions and is tested on Fedora Linux. I'm working on getting it functional on current macOS also. Next, I wrote up our four basic functional clients: ping, DNS lookup, NTP client (it doesn't set the clock, just shows you the stratum, refid and time which you can use for your own purposes), and TCP client. The TCP client accepts strings up to a defined maximum length, opens the connection, sends those strings (optionally separated by CRLF), and then reads the reply until the connection closes. This all seemed to work great on the Linux box, which you yourself can play with as a toy stack (directions at the end). Unfortunately, I then pushed it over to the Pro with Kermit and the compiler immediately started complaining. SLIP is a very thin layer on IP packets. There are exactly four metabytes, which I created preprocessor defines for: A SLIP packet ends with SLIP_END, or hex $c0. Where this must occur within a packet, it is replaced by a two byte sequence for unambiguity, SLIP_ESC SLIP_ESC_END, or hex $db $dc, and where the escape byte must occur within a packet, it gets a different two byte sequence, SLIP_ESC SLIP_ESC_ESC, or hex $db $dd. Although I initially set out to use defines and symbols everywhere instead of naked bytes, and wrote slip.c on that basis, I eventually settled on raw bytes afterwards using copious comments so it was clear what was intended to be sent. That probably saved me a lot of work renaming everything, because: Dimly I recalled that early C compilers, including System V, limit their identifiers to eight characters (the so-called "Ritchie limit"). At this point I probably should have simply removed them entirely for consistency with their absence elsewhere, but I went ahead and trimmed them down to more opaque, pithy identifiers. That wasn't the only problem, though. I originally had two functions in slip.c, slip_start and slip_stop, and it didn't like that either despite each appearing to have a unique eight-character prefix: That's because their symbols in the object file are actually prepended with various metacharacters like _ and ~, so effectively you only get seven characters in function identifiers, an issue this error message fails to explain clearly. The next problem: there's no unsigned char, at least not in PRO/VENIX Rev. 2.0 which I want to support because it's more common, and presumably the original versions of PRO/VENIX and Venix-11. (This type does exist in PRO/VENIX V2.0, but that's because it's System V and has a later C compiler.) In fact, the unsigned keyword didn't exist at all in the earliest C compilers, and even when it did, it couldn't be applied to every basic type. Although unsigned char was introduced in V7 Unix and is documented as legal in the PRO/VENIX manual, and it does exist in Venix/86 2.1 which is also a V7 Unix derivative, the PDP-11 and 8086 C compilers have different lineages and Venix's V7 PDP-11 compiler definitely doesn't support it: I suspect this may not have been intended because unsigned int works (unsigned long would be pointless on this architecture, and indeed correctly generates Misplaced 'long' on both versions of PRO/VENIX). Regardless of why, however, the plain char type on the PDP-11 is signed, and for compatibility reasons here we'll have no choice but to use it. Recall that when C89 was being codified, plain char was left as an ambiguous type since some platforms (notably PDP-11 and VAX) made it signed by default and others made it unsigned, and C89 was more about codifying existing practice than establishing new ones. That's why you see this on a modern 64-bit platform, e.g., my POWER9 workstation, where plain char is unsigned: If we change the original type explicitly to signed char on our POWER9 Linux machine, that's different: and, accounting for different sizes of int, seems similar on PRO/VENIX V2.0 (again, which is System V): but the exact same program on PRO/VENIX Rev. 2.0 behaves a bit differently: The differences in int size we expect, but there's other kinds of weird stuff going on here. The PRO/VENIX manual lists all the various permutations about type conversions and what gets turned into what where, but since the manual is already wrong about unsigned char I don't think we can trust the documentation for this part either. Our best bet is to move values into int and mask off any propagated sign bits before doing comparisons or math, which is agonizing, but reliable. That means throwing around a lot of seemingly superfluous & 0xff to make sure we don't get negative numbers where we don't want them. Once I got it built, however, there were lots of bugs. Many were because it turns out the compiler isn't too good with 32-bit long, which is not a native type on the 16-bit PDP-11. This (part of the NTP client) worked on my regular Linux desktop, but didn't work in Venix: The first problem is that the intermediate shifts are too large and overshoot, even though they should be in range for a long. Consider this example: On the POWER9, accounting for the different semantics of %lx, But on Venix, the second shift blows out the value. We can get an idea of why from the generated assembly in the adb debugger (here from PRO/VENIX V2.0, since I could cut and paste from the Kermit session): (Parenthetical notes: csav is a small subroutine that pushes volatiles r2 through r4 on the stack and turns r5 into the frame pointer; the corresponding cret unwinds this. The initial branch in this main is used to reserve additional stack space, but is often practically a no-op.) The first shift is here at ~main+024. Remember the values are octal, so 010 == 8. r0 is 16 bits wide — no 32-bit registers — so an eight-bit shift is fine. When we get to the second shift, however, it's the same instruction on just one register (030 == 24) and the overflow is never checked. In fact, the compiler never shifts the second part of the long at all. The result is thus zero. The second problem in this example is that the compiler never treats the constant as a long even though statically there's no way it can fit in a 16-bit int. To get around those two gotchas on both Venices here, I rewrote it this way: An alternative to a second variable is to explicitly mark the epoch constant itself as long, e.g., by casting it, which also works. Here's another example for your entertainment. At least some sort of pseudo-random number generator is crucial, especially for TCP when selecting the pseudo-source port and initial sequence numbers, or otherwise Slirp seemed to get very confused because we would "reuse" things a lot. Unfortunately, the obvious typical idiom to seed it like srand(time(NULL)) doesn't work: srand() expects a 16-bit int but time(NULL) returns a 32-bit long, and it turns out the compiler only passes the 16 most significant bits of the time — i.e., the ones least likely to change — to srand(). Here's the disassembly as proof (contents trimmed for display here; since this is a static binary, we can see everything we're calling): At the time we call the glue code for time from main, the value under the stack pointer (i.e., r6) is cleared immediately beforehand since we're passing NULL (at ~main+06). We then invoke the system call, which per the Venix manual for time(2) uses two registers for the 32-bit result, namely r0 (high bits) and r1 (low bits). We passed a null pointer, so the values remain in those registers and aren't written anywhere (branch at _time+014). When we return to ~main+014, however, we only put r0 on the stack for srand (remember that r5 is being used as the frame pointer; see the disassembly I provided for csav) and r1 is completely ignored. Why would this happen? It's because time(2) isn't declared anywhere in /usr/include or /usr/include/sys (the two C include directories), nor for that matter rand(3) or srand(3). This is true of both Rev. 2.0 and V2.0. Since the symbols are statically present in the standard library, linking will still work, but since the compiler doesn't know what it's supposed to be working with, it assumes int and fails to handle both halves of the long. One option is to manually declare everything ourselves. However, from the assembly at _time+016 we do know that if we pass a pointer, the entire long value will get placed there. That means we can also do this: Now this gets the lower bits and there is sufficient entropy for our purpose (though obviously not a cryptographically-secure PRNG). Interestingly, the Venix manual recommends using the time as the seed, but doesn't include any sample code. At any rate this was enough to make the pieces work for IP, ICMP and UDP, but TCP would bug out after just a handful of packets. As it happens, Venix has rather small serial buffers by modern standards: tty(7), based on the TIOCQCNT ioctl(2), appears to have just a 256-byte read buffer (sg_ispeed is only char-sized). If we don't make adjustments for this, we'll start losing framing when the buffer gets overrun, as in this extract from a test build with debugging dumps on and a maximum segment size/window of 512 bytes. Here, the bytes marked by dashes are the remote end and the bytes separated by dots are what the SLIP driver is scanning for framing and/or throwing away; you'll note there is obvious ASCII data in them. If we make the TCP MSS and window on our client side 256 bytes, there is still retransmission, but the connection is more reliable since overrun occurs less often and seems to work better than a hard cap on the maximum transmission unit (e.g., "mtu 256") from SLiRP's side. Our only consequence to dropping the TCP MSS and window size is that the TCP client is currently hard-coded to just send one packet at the beginning (this aligns with how you'd do finger, HTTP/1.x, gopher, etc.), and that datagram uses the same size which necessarily limits how much can be sent. If I did the extra work to split this over several datagrams, it obviously wouldn't be a problem anymore, but I'm lazy and worse is better! The connection can be made somewhat more reliable still by improving the SLIP driver's notion of framing. RFC 1055 only specifies that the SLIP end byte (i.e., $c0) occur at the end of a SLIP datagram, though it also notes that it was proposed very early on that it could also start datagrams — i.e., if two occur back to back, then it just looks like a zero length or otherwise obviously invalid entity which can be trivially discarded. However, since there's no guarantee or requirement that the remote link will do this, we can't assume it either. We also can't just look for a $45 byte (i.e., IPv4 and a 20 byte length) because that's an ASCII character and appears frequently in text payloads. However, $45 followed by a valid DSCP/ECN byte is much less frequent, and most of the time this byte will be either $00, $08 or $10; we don't currently support ECN (maybe we should) and we wouldn't find other DSCP values meaningful anyway. The SLIP driver uses these sequences to find the start of a datagram and $c0 to end it. While that doesn't solve the overflow issue, it means the SLIP driver will be less likely to go out of framing when the buffer does overrun and thus can better recover when the remote side retransmits. And, well, that's it. There are still glitches to bang out but it's good enough to grab Hacker News: src/ directory, run configure and then run make (parallel make is fine, I use -j24 on my POWER9). Connect your two serial ports together with a null modem, which I assume will be /dev/ttyUSB0 and /dev/ttyUSB1. Start Slirp-CK with a command line like ./slirp -b 4800 "tty /dev/ttyUSB1" but adjusting the baud and path to your serial port. Take note of the specified virtual and nameserver addresses: Unlike the given directions, you can just kill it with Control-C when you're done; the five zeroes are only if you're running your connection over standard output such as direct shell dial-in (this is a retrocomputing blog so some of you might). To see the debug version in action, next go to the BASS directory and just do a make. You'll get a billion warnings but it should still work with current gcc and clang because I specifically request -std=c89. If you use a different path for your serial port (i.e., not /dev/ttyUSB0), edit slip.c before you compile. You don't do anything like ifconfig with these tools; you always provide the tools the client IP address they'll use (or create an alias or script to do so). Try this initial example, with slirp already running: Because I'm super-lazy, you separate the components of the IPv4 address with spaces, not dots. In Slirp-land, 10.0.2.2 is always the host you are connected to. You can see the ICMP packet being sent, the bytes being scanned by the SLIP driver for framing (the ones with dots), and then the reply (with dashes). These datagram dumps have already been pre-processed for SLIP metabytes. Unfortunately, you may not be able to ping other hosts through Slirp because there's no backroute but you could try this with a direct SLIP connection, an exercise left for the reader. If Slirp doesn't want to respond and you're sure your serial port works (try testing both ends with Kermit?), you can recompile it with -DDEBUG (change this in the generated Makefile) and pass your intended debug level like -d 1 or -d 3. You'll get a file called slirp_debug with some agonizingly detailed information so you can see if it's actually getting the datagrams and/or liking the datagrams it gets. For nslookup, ntp and minisock, the second address becomes your accessible recursive nameserver (or use -i to provide an IP). The DNS dump is also given in the debug mode with slashes for the DNS answer section. nslookup and ntp are otherwise self-explanatory: minisock takes a server name (or IP) and port, followed by optional strings. The strings, up to 255 characters total (in this version), are immediately sent with CR-LFs between them except if you specify -n. If you specify no strings, none are sent. It then waits on that port for data and exits when the socket closes. This is how we did the HTTP/1.0 requests in the screenshots. On the DEC Pro, this has been tested on my trusty DEC Professional 380 running PRO/VENIX V2.0. It should compile and run on a 325 or 350, and on at least PRO/VENIX Rev. V2.0, though I don't have any hardware for this and Xhomer's serial port emulation is not good enough for this purpose (so unfortunately you'll need a real DEC Pro until I or Tarek get around to fixing it). The easiest way to get it over there is Kermit. Assuming you have this already, connect your host and the Pro on the "real" serial port at 9600bps. Make sure both sides are set to binary and just push all the files over (except the Markdown documentation unless you really want), and then do a make -f Makefile.venix (it may have been renamed to makefile.venix; adjust accordingly). Establishing the link is as simple as connecting your server's serial port to the other end of the BCC05 or equivalent from the Pro and starting Slirp to talk to that port (on my system, it's even the same port, so the same command line suffices). If you experience issues with the connection, the easiest fix is to just bounce Slirp — because there are no timeouts, there are also no retransmits. I don't know if this is hitting bugs in Slirp or in my code, though it's probably the latter. Nevertheless, I've been able to run stuff most of the day without issue. It's nice to have a simple network option and the personal satisfaction of having written it myself. There are many acknowledged deficiencies, mostly because I assume little about the system itself and tried to keep everything very simplistic. There are no timeouts and thus no retransmits, and if you break the TCP connection in the middle there will be no proper teardown. Also, because I used Slirp for the other side (as many others will), and because my internal network is full of machines that have no idea what IPv6 is, there is no IPv6 support. I agree there should be and SLIP doesn't care whether it gets IPv4 or IPv6, but for now that would require patching Slirp which is a job I just don't feel up to at the moment. I'd also like to support at least CSLIP in the future. In the meantime, if you want to try this on other operating systems, the system-dependent portions are in compat.h and slip.c with a small amount in ntp.c for handling time values. You will likely want to make changes to where your serial ports are and the speed they run at and how to make that port "raw" in slip.c. You should also add any extra #includes to compat.h that your system requires. I'd love to hear about it running other places. Slirp-CK remains under the original modified Slirp license and BASS is under the BSD 2-clause license. You can get Slirp-CK and BASS at Github.

3 weeks ago 14 votes
COMPUTE!'s Gazette revived for July 2025

COMPUTE!'s Gazette was for many years the leading Commodore-specific managzine. I liked Ahoy! and RUN, and I subscribed to Loadstar too, but Gazette had the most interesting type-ins and the most extensive coverage. They were also the last of COMPUTE!'s machine-specific magazines and one of the longest lived Commodore publications, period: yours truly had some articles published in COMPUTE (no exclamation point by then) Gazette as a youthful freelancer in the 1990s until General Media eventually made Gazette disk-only and then halted entirely in 1995. I remember pitching Tom Netzel on a column idea and getting a cryptic E-mail back from him saying that "things were afoot." What was afoot was General Media divesting the entire publication to Ziff-Davis, who was only interested in the mailing list, and I got a wholly inadequate subscription to PC Magazine in exchange which I mostly didn't read and eventually didn't renew. This week I saw an announcement about a rebooted Gazette — even with a print edition, and restoring the classic ABC/Cap Cities trade dress — slated for release in July. I'm guessing that "president and founder [sic]" Edwin Nagle either bought or licensed the name from Ziff-Davis when forming the new COMPUTE! Media; the announcement also doesn't say if he only has rights to the name, or if he actually has access to the back catalogue, which I think could be more lucrative: since there appears to be print capacity, seems like there could be some money in low-run back issue reprints or even reissuing some of their disk products, assuming any residual or royalty arrangements could be dealt with. I should say for the record that I don't have anything to do with the company myself and I don't know Nagle personally. By and large I naturally think this is a good thing, and I'll probably try to get a copy, though the stated aim of the magazine is more COMPUTE! and less Gazette since it intends to cover the entire retro community. Doing so may be the only way to ensure an adequate amount of content at a monthly cadence, so I get the reasoning, but it necessarily won't be the Gazette you remember. Also, since most retro enthusiasts have some means to push downloaded data to their machines, the type-in features which were the predominant number of pages in the 1980s will almost certainly be diminished or absent. I suspect you'll see something more like the General Media incarnation, which was a few type-ins slotted between various regular columns, reviews and feature articles. The print rate strikes me as very reasonable at $9.95/mo for a low-volume rag and I hope they can keep that up, though they would need to be finishing the content for layout fairly soon and the only proferred sample articles seem to be on their blog. I'm at most cautiously optimistic right now, but the fact they're starting up at all is nice to see, and I hope it goes somewhere.

3 weeks ago 20 votes
MacLynx beta 6: back to the Power Mac

prior articles for more of the history, but MacLynx is a throwback port of the venerable Lynx 2.7.1 to the classic Mac OS last updated in 1997 which I picked up again in 2020. Rather than try to replicate its patches against a more current Lynx which may not even build, I've been improving its interface and Mac integration along with the browser core, incorporating later code and patching the old stuff. However, beta 6 is not a fat binary — the two builds are intentionally separate. One reason is so I can use a later CodeWarrior for better code that didn't have to support 68K, but the main one is to consider different code on Power Macs which may be expensive or infeasible on 68K Macs. The primary use case for this — which may occur as soon as the next beta — is adding a built-in vendored copy of Crypto Ancienne for onboard TLS without a proxy. On all but upper-tier 68040s, setting up the TLS connection takes longer than many servers will wait, but even the lowliest Performa 6100 with a barrel-bottom 60MHz 601 can do so reasonably quickly. The port did not go altogether smoothly. While Olivier Gutknecht's original fat binary worked fine on Power Macs, it took quite a while to get all the pieces reassembled on a later CodeWarrior with a later version of GUSI, the Mac POSIX glue layer which is a critical component (the Power Mac version uses 2.2.3, the 68K version uses 1.8.0). Some functions had changed and others were missing and had to be rewritten with later alternatives. One particularly obnoxious glitch was due to a conflict between the later GUSI's time.h and Apple Universal Interfaces' Time.h (remember, HFS+ is case-insensitive) which could not be solved by changing the search order in the project due to other conflicting headers. The simplest solution was to copy Time.h into the project and name it something else! Even after that, though, basic Mac GUI operations like popping open the URL dialogue would cause it to crash. Can you figure out why? Here's a hint: application: your application itself was almost certainly fully native. However, a certain amount of the Toolbox and the Mac OS retained 68K code, even in the days of Classic under Mac OS X, and your PowerPC application would invariably hit one of these routines eventually. The component responsible for switching between ISAs is the Mixed Mode Manager, which is tightly integrated with the 68K emulator and bridges the two architectures' different calling conventions, marshalling their parameters (PowerPC in registers, 68K on the stack) and managing return addresses. I'm serious when I say the normal state is to run 68K code: 68K code is necessarily the first-class citizen in Mac OS, even in PowerPC-only versions, because to run 68K apps seamlessly they must be able to call any 68K routine directly. All the traps that 68K apps use must also look like 68K code to them — and PowerPC apps often use those traps, too, because they're fundamental to the operating system. 68K apps can and do call code fragments in either ISA using the Code Fragment Manager (and PowerPC apps are obliged to), but the system must still be able to run non-CFM apps that are unaware of its existence. To jump to native execution thus requires an additional step. Say a 68K app running in emulation calls a function in the Toolbox which used to be 68K, but is now PowerPC. On a 68K MacOS, this is just 68K code. In later versions, this is replaced by a routine descriptor with a special trap meaningful only to the 68K emulator. This descriptor contains the destination calling convention and a pointer to the PowerPC function's transition vector, which has both the starting address of the code fragment and the initial value for the TOC environment register. The MMM converts the parameters to a PowerOpen ABI call according to the specified convention and moves the return address into the PowerPC link register, and upon conclusion converts the result back and unwinds the stack. The same basic idea works for 68K code calling a PowerPC routine. Unfortunately, we forgot to make a descriptor for this and other routines the Toolbox modal dialogue routine expected to call, so the nanokernel remains in 68K mode trying to execute them and makes a big mess. (It's really hard to debug it when this happens, too; the backtrace is usually totally thrashed.) the last time that my idea with MacLynx is to surround the text core with the Mac interface. Lynx keys should still work and it should still act like Lynx, but once you move to a GUI task you should stay in the GUI until that task is completed. In beta 5, I added support for the Standard File package so you get a requester instead of entering a filename, but once you do this you still need to manually select "Save to disk" inside Lynx. That changes in beta 6: :: which in MacOS is treated as the parent folder. Resizing, scrolling and repainting are also improved. The position of the thumb in MacLynx's scrollbar is now implemented using a more complex but yet more dynamic algorithm which should also more properly respond to resize events. A similar change fixes scroll wheels with USB Overdrive. When MacLynx's default window opens, a scrollbar control is artificially added to it. USB Overdrive implements its scrollwheel support by finding the current window's scrollbar, if any, and emulating clicks on its up and down (or left and right) buttons as the wheel is moved. This works fine in MacLynx, at least initially. When the window is resized, however, USB Overdrive seems to lose track of the scrollbar, which causes its scrollwheel functionality to stop working. The solution was to destroy and rebuild the scrollbar after the window takes its new dimensions, like what happens on start up when the window first opens. This little song and dance may also fix other scrollwheel extensions. Always keep in mind that the scrollbar is actually used as a means to send commands to Lynx to change its window on the document; it isn't scrolling, say, a pre-rendered GWorld. This causes the screen to be redrawn quite frequently, and big window sizes tend to chug. You can also outright crash the browser with large window widths: this is difficult to do on a 68K Mac with on-board video where the maximum screen size isn't that large, but on my 1920x1080 G4 I can do so reliably. lynx.cfg a no-op. However, if you are intentionally using another character set and this will break you, please feel free to plead your use case to me and I will consider it. Another bug fixed was an infinite loop that could trigger during UTF-8 conversion of certain text strings. These sorts of bugs are also a big pain to puzzle out because all you can do from CodeWarrior is force a trap with an NMI, leaving the debugger's view of the program counter likely near but probably not at the scene of the foul. Eventually I single-stepped from a point near the actual bug and was able to see what was happening, and it turned out to be a very stupid bug on my part, and that's all I'm going to say about that. SameSite and HttpOnly (irrelevant on Lynx but supported for completeness) attributes are, the next problem was that any cookie with an expiration value — which nowadays is nearly any login cookie — wouldn't stick. The problem turned out to be the difference in how the classic MacOS handles time values. In 32-bit Un*xy things, including Mac OS X, time_t is a signed 32-bit integer with an epoch starting on Thursday, January 1, 1970. In the classic MacOS, time_t is an unsigned 32-bit integer with an epoch starting on Friday, January 1, 1904. (This is also true for timestamps in HFS+ filesystems, even in Mac OS X and modern macOS, but not APFS.) Lynx has a utility function that can convert a ASCII date string into a seconds-past-the-epoch count, but in case you haven't guessed, this function defaults to the Unix epoch. In fact, the version previously in MacLynx only supports the Unix epoch. That means when converted into seconds after the epoch, the cookie expiration value would always appear to be in the past compared to the MacOS time value which, being based on a much earlier epoch, will always be much larger — and thus MacLynx would conclude the cookie was actually expired and politely clear it. I reimplemented this function based on the MacOS epoch, and now login cookies actually let you log in! Unfortunately other cookies like trackers can be set too, and this is why we can't have nice things. Sorry. At least they don't persist between runs of the browser. Even then, though, there's still some additional time fudging because time(NULL) on my Quadra 800 running 8.1 and time(NULL) on my G4 MDD running 9.2.2, despite their clocks being synchronized to the same NTP source down to the second, yielded substantially different values. Both of these calls should go to the operating system and use the standard Mac epoch, and not through GUSI, so GUSI can't be why. For the time being I use a second fudge factor if we get an outlandish result before giving up. I'm still trying to figure out why this is necessary. ogle). This didn't work for PNG images before because it was using the wrong internal MIME type, which is now fixed. (Ignore the MIME types in the debug window because that's actually a problem I noticed with my Internet Config settings, not MacLynx. Fortunately Picture Viewer will content-sniff, so it figures it out.) Finally, there is also miscellaneous better status code and redirect handling (again not a problem with mainline Lynx, just our older fork here), which makes login and browsing sites more streamlined, and you can finally press Shift-Tab to cycle backwards through forms and links. If you want to build MacLynx from source, building beta 6 is largely the same on 68K with the same compiler and prerequisites except that builds are now segregated to their own folders and you will need to put a copy of lynx.cfg in with them (the StuffIt source archive does not have aliases predone for you). For the PowerPC version, you'll need the same set up but substituting CodeWarrior Pro 7.1, and, like CWGUSI, GUSI 2.2.3 should be in the same folder or volume that contains the MacLynx source tree. There are debug and optimized builds for each architecture. Pre-built binaries and source are available from the main MacLynx page. MacLynx, like Lynx, is released under the GNU General Public License v2.

a month ago 19 votes

More in technology

2025-05-11 air traffic control

Air traffic control has been in the news lately, on account of my country's declining ability to do it. Well, that's a long-term trend, resulting from decades of under-investment, severe capture by our increasingly incompetent defense-industrial complex, no small degree of management incompetence in the FAA, and long-lasting effects of Reagan crushing the PATCO strike. But that's just my opinion, you know, maybe airplanes got too woke. In any case, it's an interesting time to consider how weird parts of air traffic control are. The technical, administrative, and social aspects of ATC all seem two notches more complicated than you would expect. ATC is heavily influenced by its peculiar and often accidental development, a product of necessity that perpetually trails behind the need, and a beneficiary of hand-me-down military practices and technology. Aviation Radio In the early days of aviation, there was little need for ATC---there just weren't many planes, and technology didn't allow ground-based controllers to do much of value. There was some use of flags and signal lights to clear aircraft to land, but for the most part ATC had to wait for the development of aviation radio. The impetus for that work came mostly from the First World War. Here we have to note that the history of aviation is very closely intertwined with the history of warfare. Aviation technology has always rapidly advanced during major conflicts, and as we will see, ATC is no exception. By 1913, the US Army Signal Corps was experimenting with the use of radio to communicate with aircraft. This was pretty early in radio technology, and the aircraft radios were huge and awkward to operate, but it was also early in aviation and "huge and awkward to operate" could be similarly applied to the aircraft of the day. Even so, radio had obvious potential in aviation. The first military application for aircraft was reconnaissance. Pilots could fly past the front to find artillery positions and otherwise provide useful information, and then return with maps. Well, even better than returning with a map was providing the information in real-time, and by the end of the war medium-frequency AM radios were well developed for aircraft. Radios in aircraft lead naturally to another wartime innovation: ground control. Military personnel on the ground used radio to coordinate the schedules and routes of reconnaissance planes, and later to inform on the positions of fighters and other enemy assets. Without any real way to know where the planes were, this was all pretty primitive, but it set the basic pattern that people on the ground could keep track of aircraft and provide useful information. Post-war, civil aviation rapidly advanced. The early 1920s saw numerous commercial airlines adopting radio, mostly for business purposes like schedule coordination. Once you were in contact with someone on the ground, though, it was only logical to ask about weather and conditions. Many of our modern practices like weather briefings, flight plans, and route clearances originated as more or less formal practices within individual airlines. Air Mail The government was not left out of the action. The Post Office operated what may have been the largest commercial aviation operation in the world during the early 1920s, in the form of Air Mail. The Post Office itself did not have any aircraft; all of the flying was contracted out---initially to the Army Air Service, and later to a long list of regional airlines. Air Mail was considered a high priority by the Post Office and proved very popular with the public. When the transcontinental route began proper operation in 1920, it became possible to get a letter from New York City to San Francisco in just 33 hours by transferring it between airplanes in a nearly non-stop relay race. The Post Office's largesse in contracting the service to private operators provided not only the funding but the very motivation for much of our modern aviation industry. Air travel was not very popular at the time, being loud and uncomfortable, but the mail didn't complain. The many contract mail carriers of the 1920s grew and consolidated into what are now some of the United States' largest companies. For around a decade, the Post Office almost singlehandedly bankrolled civil aviation, and passengers were a side hustle [1]. Air mail ambition was not only of economic benefit. Air mail routes were often longer and more challenging than commercial passenger routes. Transcontinental service required regular flights through sparsely populated parts of the interior, challenging the navigation technology of the time and making rescue of downed pilots a major concern. Notably, air mail operators did far more nighttime flying than any other commercial aviation in the 1920s. The post office became the government's de facto technical leader in civil aviation. Besides the network of beacons and markers built to guide air mail between cities, the post office built 17 Air Mail Radio Stations along the transcontinental route. The Air Mail Radio Stations were the company radio system for the entire air mail enterprise, and the closest thing to a nationwide, public air traffic control service to then exist. They did not, however, provide what we would now call control. Their role was mainly to provide pilots with information (including, critically, weather reports) and to keep loose tabs on air mail flights so that a disappearance would be noticed in time to send search and rescue. In 1926, the Watres Act created the Aeronautic Branch of the Department of Commerce. The Aeronautic Branch assumed a number of responsibilities, but one of them was the maintenance of the Air Mail routes. Similarly, the Air Mail Radio Stations became Aeronautics Branch facilities, and took on the new name of Flight Service Stations. No longer just for the contract mail carriers, the Flight Service Stations made up a nationwide network of government-provided services to aviators. They were the first edifices in what we now call the National Airspace System (NAS): a complex combination of physical facilities, technologies, and operating practices that enable safe aviation. In 1935, the first en-route air traffic control center opened, a facility in Newark owned by a group of airlines. The Aeronautic Branch, since renamed the Bureau of Air Commerce, supported the airlines in developing this new concept of en-route control that used radio communications and paperwork to track which aircraft were in which airways. The rising number of commercial aircraft made in-air collisions a bigger problem, so the Newark control center was quickly followed by more facilities built on the same pattern. In 1936, the Bureau of Air Commerce took ownership of these centers, and ATC became a government function alongside the advisory and safety services provided by the flight service stations. En route center controllers worked off of position reports from pilots via radio, but needed a way to visualize and track aircraft's positions and their intended flight paths. Several techniques helped: first, airlines shared their flight planning paperwork with the control centers, establishing "flight plans" that corresponded to each aircraft in the sky. Controllers adopted a work aid called a "flight strip," a small piece of paper with the key information about an aircraft's identity and flight plan that could easily be handed between stations. By arranging the flight strips on display boards full of slots, controllers could visualize the ordering of aircraft in terms of altitude and airway. Second, each center was equipped with a large plotting table map where controllers pushed markers around to correspond to the position reports from aircraft. A small flag on each marker gave the flight number, so it could easily be correlated to a flight strip on one of the boards mounted around the plotting table. This basic concept of air traffic control, of a flight strip and a position marker, is still in use today. Radar The Second World War changed aviation more than any other event of history. Among the many advancements were two British inventions of particular significance: first, the jet engine, which would make modern passenger airliners practical. Second, the radar, and more specifically the magnetron. This was a development of such significance that the British government treated it as a secret akin to nuclear weapons; indeed, the UK effectively traded radar technology to the US in exchange for participation in US nuclear weapons research. Radar created radical new possibilities for air defense, and complimented previous air defense development in Britain. During WWI, the organization tasked with defending London from aerial attack had developed a method called "ground-controlled interception" or GCI. Under GCI, ground-based observers identify possible targets and then direct attack aircraft towards them via radio. The advent of radar made GCI tremendously more powerful, allowing a relatively small number of radar-assisted air defense centers to monitor for inbound attack and then direct defenders with real-time vectors. In the first implementation, radar stations reported contacts via telephone to "filter centers" that correlated tracks from separate radars to create a unified view of the airspace---drawn in grease pencil on a preprinted map. Filter center staff took radar and visual reports and updated the map by moving the marks. This consolidated information was then provided to air defense bases, once again by telephone. Later technical developments in the UK made the process more automated. The invention of the "plan position indicator" or PPI, the type of radar scope we are all familiar with today, made the radar far easier to operate and interpret. Radar sets that automatically swept over 360 degrees allowed each radar station to see all activity in its area, rather than just aircraft passing through a defensive line. These new capabilities eliminated the need for much of the manual work: radar stations could see attacking aircraft and defending aircraft on one PPI, and communicated directly with defenders by radio. It became routine for a radar operator to give a pilot navigation vectors by radio, based on real-time observation of the pilot's position and heading. A controller took strategic command of the airspace, effectively steering the aircraft from a top-down view. The ease and efficiency of this workflow was a significant factor in the end of the Battle of Britain, and its remarkable efficacy was noticed in the US as well. At the same time, changes were afoot in the US. WWII was tremendously disruptive to civil aviation; while aviation technology rapidly advanced due to wartime needs those same pressing demands lead to a slowdown in nonmilitary activity. A heavy volume of military logistics flights and flight training, as well as growing concerns about defending the US from an invasion, meant that ATC was still a priority. A reorganization of the Bureau of Air Commerce replaced it with the Civil Aeronautics Authority, or CAA. The CAA's role greatly expanded as it assumed responsibility for airport control towers and commissioned new en route centers. As WWII came to a close, CAA en route control centers began to adopt GCI techniques. By 1955, the name Air Route Traffic Control Center (ARTCC) had been adopted for en route centers and the first air surveillance radars were installed. In a radar-equipped ARTCC, the map where controllers pushed markers around was replaced with a large tabletop PPI built to a Navy design. The controllers still pushed markers around to track the identities of aircraft, but they moved them based on their corresponding radar "blips" instead of radio position reports. Air Defense After WWII, post-war prosperity and wartime technology like the jet engine lead to huge growth in commercial aviation. During the 1950s, radar was adopted by more and more ATC facilities (both "terminal" at airports and "en route" at ARTCCs), but there were few major changes in ATC procedure. With more and more planes in the air, tracking flight plans and their corresponding positions became labor intensive and error-prone. A particular problem was the increasing range and speed of aircraft, and corresponding longer passenger flights, that meant that many aircraft passed from the territory of one ARTCC into another. This required that controllers "hand off" the aircraft, informing the "next" ARTCC of the flight plan and position at which the aircraft would enter their airspace. In 1956, 128 people died in a mid-air collision of two commercial airliners over the Grand Canyon. In 1958, 49 people died when a military fighter struck a commercial airliner over Nevada. These were not the only such incidents in the mid-1950s, and public trust in aviation started to decline. Something had to be done. First, in 1958 the CAA gave way to the Federal Aviation Administration. This was more than just a name change: the FAA's authority was greatly increased compared tot he CAA, most notably by granting it authority over military aviation. This is a difficult topic to explain succinctly, so I will only give broad strokes. Prior to 1958, military aviation was completely distinct from civil aviation, with no coordination and often no communication at all between the two. This was, of course, a factor in the 1958 collision. Further, the 1956 collision, while it did not involve the military, did result in part from communications issues between separate distinct CAA facilities and the airline's own control facilities. After 1958, ATC was completely unified into one organization, the FAA, which assumed the work of the military controllers of the time and some of the role of the airlines. The military continues to have its own air controllers to this day, and military aircraft continue to include privileges such as (practical but not legal) exemption from transponder requirements, but military flights over the US are still beholden to the same ATC as civil flights. Some exceptions apply, void where prohibited, etc. The FAA's suddenly increased scope only made the practical challenges of ATC more difficult, and commercial aviation numbers continued to rise. As soon as the FAA was formed, it was understood that there needed to be major investments in improving the National Airspace System. While the first couple of years were dominated by the transition, the FAA's second director (Najeeb Halaby) prepared two lengthy reports examining the situation and recommending improvements. One of these, the Beacon report (also called Project Beacon), specifically addressed ATC. The Beacon report's recommendations included massive expansion of radar-based control (called "positive control" because of the controller's access to real-time feedback on aircraft movements) and new control procedures for airways and airports. Even better, for our purposes, it recommended the adoption of general-purpose computers and software to automate ATC functions. Meanwhile, the Cold War was heating up. US air defense, a minor concern in the few short years after WWII, became a higher priority than ever before. The Soviet Union had long-range aircraft capable of reaching the United States, and nuclear weapons meant that only a few such aircraft had to make it to cause massive destruction. Considering the vast size of the United States (and, considering the new unified air defense command between the United States and Canada, all of North America) made this a formidable challenge. During the 1950s, the newly minted Air Force worked closely with MIT's Lincoln Laboratory (an important center of radar research) and IBM to design a computerized, integrated, networked system for GCI. When the Air Force committed to purchasing the system, it was christened the Semi-Automated Ground Environment, or SAGE. SAGE is a critical juncture in the history of the computer and computer communications, the first system to demonstrate many parts of modern computer technology and, moreover, perhaps the first large-scale computer system of any kind. SAGE is an expansive topic that I will not take on here; I'm sure it will be the focus of a future article but it's a pretty well-known and well-covered topic. I have not so far felt like I had much new to contribute, despite it being the first item on my "list of topics" for the last five years. But one of the things I want to tell you about SAGE, that is perhaps not so well known, is that SAGE was not used for ATC. SAGE was a purely military system. It was commissioned by the Air Force, and its numerous operating facilities (called "direction centers") were located on Air Force bases along with the interceptor forces they would direct. However, there was obvious overlap between the functionality of SAGE and the needs of ATC. SAGE direction centers continuously received tracks from remote data sites using modems over leased telephone lines, and automatically correlated multiple radar tracks to a single aircraft. Once an operator entered information about an aircraft, SAGE stored that information for retrieval by other radar operators. When an aircraft with associated data passed from the territory of one direction center to another, the aircraft's position and related information were automatically transmitted to the next direction center by modem. One of the key demands of air defense is the identification of aircraft---any unknown track might be routine commercial activity, or it could be an inbound attack. The air defense command received flight plan data on commercial flights (and more broadly all flights entering North America) from the FAA and entered them into SAGE, allowing radar operators to retrieve "flight strip" data on any aircraft on their scope. Recognizing this interconnection with ATC, as soon as SAGE direction centers were being installed the Air Force started work on an upgrade called SAGE Air Traffic Integration, or SATIN. SATIN would extend SAGE to serve the ATC use-case as well, providing SAGE consoles directly in ARTCCs and enhancing SAGE to perform non-military safety functions like conflict warning and forward projection of flight plans for scheduling. Flight strips would be replaced by teletype output, and in general made less necessary by the computer's ability to filter the radar scope. Experimental trial installations were made, and the FAA participated readily in the research efforts. Enhancement of SAGE to meet ATC requirements seemed likely to meet the Beacon report's recommendations and radically improve ARTCC operations, sooner and cheaper than development of an FAA-specific system. As it happened, well, it didn't happen. SATIN became interconnected with another planned SAGE upgrade to the Super Combat Centers (SCC), deep underground combat command centers with greatly enhanced SAGE computer equipment. SATIN and SCC planners were so confident that the last three Air Defense Sectors scheduled for SAGE installation, including my own Albuquerque, were delayed under the assumption that the improved SATIN/SCC equipment should be installed instead of the soon-obsolete original system. SCC cost estimates ballooned, and the program's ambitions were reduced month by month until it was canceled entirely in 1960. Albuquerque never got a SAGE installation, and the Albuquerque air defense sector was eliminated by reorganization later in 1960 anyway. Flight Service Stations Remember those Flight Service Stations, the ones that were originally built by the Post Office? One of the oddities of ATC is that they never went away. FSS were transferred to the CAB, to the CAA, and then to the FAA. During the 1930s and 1940s many more were built, expanding coverage across much of the country. Throughout the development of ATC, the FSS remained responsible for non-control functions like weather briefing and flight plan management. Because aircraft operating under instrument flight rules must closely comply with ATC, the involvement of FSS in IFR flights is very limited, and FSS mostly serve VFR traffic. As ATC became common, the FSS gained a new and somewhat odd role: playing go-between for ATC. FSS were more numerous and often located in sparser areas between cities (while ATC facilities tended to be in cities), so especially in the mid-century, pilots were more likely to be able to reach an FSS than ATC. It was, for a time, routine for FSS to relay instructions between pilots and controllers. This is still done today, although improved communications have made the need much less common. As weather dissemination improved (another topic for a future post), FSS gained access to extensive weather conditions and forecasting information from the Weather Service. This connectivity is bidirectional; during the midcentury FSS not only received weather forecasts by teletype but transmitted pilot reports of weather conditions back to the Weather Service. Today these communications have, of course, been computerized, although the legacy teletype format doggedly persists. There has always been an odd schism between the FSS and ATC: they are operated by different departments, out of different facilities, with different functions and operating practices. In 2005, the FAA cut costs by privatizing the FSS function entirely. Flight service is now operated by Leidos, one of the largest government contractors. All FSS operations have been centralized to one facility that communicates via remote radio sites. While flight service is still available, increasing automation has made the stations far less important, and the general perception is that flight service is in its last years. Last I looked, Leidos was not hiring for flight service and the expectation was that they would never hire again, retiring the service along with its staff. Flight service does maintain one of my favorite internet phenomenon, the phone number domain name: 1800wxbrief.com. One of the odd manifestations of the FSS/ATC schism and the FAA's very partial privatization is that Leidos maintains an online aviation weather portal that is separate from, and competes with, the Weather Service's aviationweather.gov. Since Flight Service traditionally has the responsibility for weather briefings, it is honestly unclear to what extend Leidos vs. the National Weather Service should be investing in aviation weather information services. For its part, the FAA seems to consider aviationweather.gov the official source, while it pays for 1800wxbrief.com. There's also weathercams.faa.gov, which duplicates a very large portion (maybe all?) of the weather information on Leidos's portal and some of the NWS's. It's just one of those things. Or three of those things, rather. Speaking of duplication due to poor planning... The National Airspace System Left in the lurch by the Air Force, the FAA launched its own program for ATC automation. While the Air Force was deploying SAGE, the FAA had mostly been waiting, and various ARTCCs had adopted a hodgepodge of methods ranging from one-off computer systems to completely paper-based tracking. By 1960 radar was ubiquitous, but different radar systems were used at different facilities, and correlation between radar contacts and flight plans was completely manual. The FAA needed something better, and with growing congressional support for ATC modernization, they had the money to fund what they called National Airspace System En Route Stage A. Further bolstering historical confusion between SAGE and ATC, the FAA decided on a practical, if ironic, solution: buy their own SAGE. In an upcoming article, we'll learn about the FAA's first fully integrated computerized air traffic control system. While the failed detour through SATIN delayed the development of this system, the nearly decade-long delay between the design of SAGE and the FAA's contract allowed significant technical improvements. This "New SAGE," while directly based on SAGE at a functional level, used later off-the-shelf computer equipment including the IBM System/360, giving it far more resemblance to our modern world of computing than SAGE with its enormous, bespoke AN/FSQ-7. And we're still dealing with the consequences today! [1] It also laid the groundwork for the consolidation of the industry, with a 1930 decision that took air mail contracts away from most of the smaller companies and awarded them instead to the precursors of United, TWA, and American Airlines.

yesterday 1 votes
Sierpiński triangle? In my bitwise AND?

Exploring a peculiar bit-twiddling hack at the intersection of 1980s geek sensibilities.

2 days ago 4 votes
Reverse engineering the 386 processor's prefetch queue circuitry

In 1985, Intel introduced the groundbreaking 386 processor, the first 32-bit processor in the x86 architecture. To improve performance, the 386 has a 16-byte instruction prefetch queue. The purpose of the prefetch queue is to fetch instructions from memory before they are needed, so the processor usually doesn't need to wait on memory while executing instructions. Instruction prefetching takes advantage of times when the processor is "thinking" and the memory bus would otherwise be unused. In this article, I look at the 386's prefetch queue circuitry in detail. One interesting circuit is the incrementer, which adds 1 to a pointer to step through memory. This sounds easy enough, but the incrementer uses complicated circuitry for high performance. The prefetch queue uses a large network to shift bytes around so they are properly aligned. It also has a compact circuit to extend signed 8-bit and 16-bit numbers to 32 bits. There aren't any major discoveries in this post, but if you're interested in low-level circuits and dynamic logic, keep reading. The photo below shows the 386's shiny fingernail-sized silicon die under a microscope. Although it may look like an aerial view of a strangely-zoned city, the die photo reveals the functional blocks of the chip. The Prefetch Unit in the upper left is the relevant block. In this post, I'll discuss the prefetch queue circuitry (highlighted in red), skipping over the prefetch control circuitry to the right. The Prefetch Unit receives data from the Bus Interface Unit (upper right) that communicates with memory. The Instruction Decode Unit receives prefetched instructions from the Prefetch Unit, byte by byte, and decodes the opcodes for execution. This die photo of the 386 shows the location of the registers. Click this image (or any other) for a larger version. The left quarter of the chip consists of stripes of circuitry that appears much more orderly than the rest of the chip. This grid-like appearance arises because each functional block is constructed (for the most part) by repeating the same circuit 32 times, once for each bit, side by side. Vertical data lines run up and down, in groups of 32 bits, connecting the functional blocks. To make this work, each circuit must fit into the same width on the die; this layout constraint forces the circuit designers to develop a circuit that uses this width efficiently without exceeding the allowed width. The circuitry for the prefetch queue uses the same approach: each circuit is 66 µm wide1 and repeated 32 times. As will be seen, fitting the prefetch circuitry into this fixed width requires some layout tricks. What the prefetcher does The purpose of the prefetch unit is to speed up performance by reading instructions from memory before they are needed, so the processor won't need to wait to get instructions from memory. Prefetching takes advantage of times when the memory bus is otherwise idle, minimizing conflict with other instructions that are reading or writing data. In the 386, prefetched instructions are stored in a 16-byte queue, consisting of four 32-bit blocks.2 The diagram below zooms in on the prefetcher and shows its main components. You can see how the same circuit (in most cases) is repeated 32 times, forming vertical bands. At the top are 32 bus lines from the Bus Interface Unit. These lines provide the connection between the datapath and external memory, via the Bus Interface Unit. These lines form a triangular pattern as the 32 horizontal lines on the right branch off and form 32 vertical lines, one for each bit. Next are the fetch pointer and the limit register, with a circuit to check if the fetch pointer has reached the limit. Note that the two low-order bits (on the right) of the incrementer and limit check circuit are missing. At the bottom of the incrementer, you can see that some bit positions have a blob of circuitry missing from others, breaking the pattern of repeated blocks. The 16-byte prefetch queue is below the incrementer. Although this memory is the heart of the prefetcher, its circuitry takes up a relatively small area. A close-up of the prefetcher with the main blocks labeled. At the right, the prefetcher receives control signals. The bottom part of the prefetcher shifts data to align it as needed. A 32-bit value can be split across two 32-bit rows of the prefetch buffer. To handle this, the prefetcher includes a data shift network to shift and align its data. This network occupies a lot of space, but there is no active circuitry here: just a grid of horizontal and vertical wires. Finally, the sign extend circuitry converts a signed 8-bit or 16-bit value into a signed 16-bit or 32-bit value as needed. You can see that the sign extend circuitry is highly irregular, especially in the middle. A latch stores the output of the prefetch queue for use by the rest of the datapath. Limit check If you've written x86 programs, you probably know about the processor's Instruction Pointer (EIP) that holds the address of the next instruction to execute. As a program executes, the Instruction Pointer moves from instruction to instruction. However, it turns out that the Instruction Pointer doesn't actually exist! Instead, the 386 has an "Advance Instruction Fetch Pointer", which holds the address of the next instruction to fetch into the prefetch queue. But sometimes the processor needs to know the Instruction Pointer value, for instance, to determine the return address when calling a subroutine or to compute the destination address of a relative jump. So what happens? The processor gets the Advance Instruction Fetch Pointer address from the prefetch queue circuitry and subtracts the current length of the prefetch queue. The result is the address of the next instruction to execute, the desired Instruction Pointer value. The Advance Instruction Fetch Pointer—the address of the next instruction to prefetch—is stored in a register at the top of the prefetch queue circuitry. As instructions are prefetched, this pointer is incremented by the prefetch circuitry. (Since instructions are fetched 32 bits at a time, this pointer is incremented in steps of four and the bottom two bits are always 0.) But what keeps the prefetcher from prefetching too far and going outside the valid memory range? The x86 architecture infamously uses segments to define valid regions of memory. A segment has a start and end address (known as the base and limit) and memory is protected by blocking accesses outside the segment. The 386 has six active segments; the relevant one is the Code Segment that holds program instructions. Thus, the limit address of the Code Segment controls when the prefetcher must stop prefetching.3 The prefetch queue contains a circuit to stop prefetching when the fetch pointer reaches the limit of the Code Segment. In this section, I'll describe that circuit. Comparing two values may seem trivial, but the 386 uses a few tricks to make this fast. The basic idea is to use 30 XOR gates to compare the bits of the two registers. (Why 30 bits and not 32? Since 32 bits are fetched at a time, the bottom bits of the address are 00 and can be ignored.) If the two registers match, all the XOR values will be 0, but if they don't match, an XOR value will be 1. Conceptually, connecting the XORs to a 32-input OR gate will yield the desired result: 0 if all bits match and 1 if there is a mismatch. Unfortunately, building a 32-input OR gate using standard CMOS logic is impractical for electrical reasons, as well as inconveniently large to fit into the circuit. Instead, the 386 uses dynamic logic to implement a spread-out NOR gate with one transistor in each column of the prefetcher. The schematic below shows the implementation of one bit of the equality comparison. The mechanism is that if the two registers differ, the transistor on the right is turned on, pulling the equality bus low. This circuit is replicated 30 times, comparing all the bits: if there is any mismatch, the equality bus will be pulled low, but if all bits match, the bus remains high. The three gates on the left implement XNOR; this circuit may seem overly complicated, but it is a standard way of implementing XNOR. The NOR gate at the right blocks the comparison except during clock phase 2. (The importance of this will be explained below.) This circuit is repeated 30 times to compare the registers. The equality bus travels horizontally through the prefetcher, pulled low if any bits don't match. But what pulls the bus high? That's the job of the dynamic circuit below. Unlike regular static gates, dynamic logic is controlled by the processor's clock signals and depends on capacitance in the circuit to hold data. The 386 is controlled by a two-phase clock signal.4 In the first clock phase, the precharge transistor below turns on, pulling the equality bus high. In the second clock phase, the XOR circuits above are enabled, pulling the equality bus low if the two registers don't match. Meanwhile, the CMOS switch turns on in clock phase 2, passing the equality bus's value to the latch. The "keeper" circuit keeps the equality bus held high unless it is explicitly pulled low, to avoid the risk of the voltage on the equality bus slowly dissipating. The keeper uses a weak transistor to keep the bus high while inactive. But if the bus is pulled low, the keeper transistor is overpowered and turns off. This is the output circuit for the equality comparison. This circuit is located to the right of the prefetcher. This dynamic logic reduces power consumption and circuit size. Since the bus is charged and discharged during opposite clock phases, you avoid steady current through the transistors. (In contrast, an NMOS processor like the 8086 might use a pull-up on the bus. When the bus is pulled low, would you end up with current flowing through the pull-up and the pull-down transistors. This would increase power consumption, make the chip run hotter, and limit your clock speed.) The incrementer After each prefetch, the Advance Instruction Fetch Pointer must be incremented to hold the address of the next instruction to prefetch. Incrementing this pointer is the job of the incrementer. (Because each fetch is 32 bits, the pointer is incremented by 4 each time. But in the die photo, you can see a notch in the incrementer and limit check circuit where the circuitry for the bottom two bits has been omitted. Thus, the incrementer's circuitry increments its value by 1, so the pointer (with two zero bits appended) increases in steps of 4.) Building an incrementer circuit is straightforward, for example, you can use a chain of 30 half-adders. The problem is that incrementing a 30-bit value at high speed is difficult because of the carries from one position to the next. It's similar to calculating 99999999 + 1 in decimal; you need to tediously carry the 1, carry the 1, carry the 1, and so forth, through all the digits, resulting in a slow, sequential process. The incrementer uses a faster approach. First, it computes all the carries at high speed, almost in parallel. Then it computes each output bit in parallel from the carries—if there is a carry into a position, it toggles that bit. Computing the carries is straightforward in concept: if there is a block of 1 bits at the end of the value, all those bits will produce carries, but carrying is stopped by the rightmost 0 bit. For instance, incrementing binary 11011 results in 11100; there are carries from the last two bits, but the zero stops the carries. A circuit to implement this was developed at the University of Manchester in England way back in 1959, and is known as the Manchester carry chain. In the Manchester carry chain, you build a chain of switches, one for each data bit, as shown below. For a 1 bit, you close the switch, but for a 0 bit you open the switch. (The switches are implemented by transistors.) To compute the carries, you start by feeding in a carry signal at the right The signal will go through the closed switches until it hits an open switch, and then it will be blocked.5 The outputs along the chain give us the desired carry value at each position. Concept of the Manchester carry chain, 4 bits. Since the switches in the Manchester carry chain can all be set in parallel and the carry signal blasts through the switches at high speed, this circuit rapidly computes the carries we need. The carries then flip the associated bits (in parallel), giving us the result much faster than a straightforward adder. There are complications, of course, in the actual implementation. The carry signal in the carry chain is inverted, so a low signal propagates through the carry chain to indicate a carry. (It is faster to pull a signal low than high.) But something needs to make the line go high when necessary. As with the equality circuitry, the solution is dynamic logic. That is, the carry line is precharged high during one clock phase and then processing happens in the second clock phase, potentially pulling the line low. The next problem is that the carry signal weakens as it passes through multiple transistors and long lengths of wire. The solution is that each segment has a circuit to amplify the signal, using a clocked inverter and an asymmetrical inverter. Importantly, this amplifier is not in the carry chain path, so it doesn't slow down the signal through the chain. The Manchester carry chain circuit for a typical bit in the incrementer. The schematic above shows the implementation of the Manchester carry chain for a typical bit. The chain itself is at the bottom, with the transistor switch as before. During clock phase 1, the precharge transistor pulls this segment of the carry chain high. During clock phase 2, the signal on the chain goes through the "clocked inverter" at the right to produce the local carry signal. If there is a carry, the next bit is flipped by the XOR gate, producing the incremented output.6 The "keeper/amplifier" is an asymmetrical inverter that produces a strong low output but a weak high output. When there is no carry, its weak output keeps the carry chain pulled high. But as soon as a carry is detected, it strongly pulls the carry chain low to boost the carry signal. But this circuit still isn't enough for the desired performance. The incrementer uses a second carry technique in parallel: carry skip. The concept is to look at blocks of bits and allow the carry to jump over the entire block. The diagram below shows a simplified implementation of the carry skip circuit. Each block consists of 3 to 6 bits. If all the bits in a block are 1's, then the AND gate turns on the associated transistor in the carry skip line. This allows the carry skip signal to propagate (from left to right), a block at a time. When it reaches a block with a 0 bit, the corresponding transistor will be off, stopping the carry as in the Manchester carry chain. The AND gates all operate in parallel, so the transistors are rapidly turned on or off in parallel. Then, the carry skip signal passes through a small number of transistors, without going through any logic. (The carry skip signal is like an express train that skips most stations, while the Manchester carry chain is the local train to all the stations.) Like the Manchester carry chain, the implementation of carry skip needs precharge circuits on the lines, a keeper/amplifier, and clocked logic, but I'll skip the details. An abstracted and simplified carry-skip circuit. The block sizes don't match the 386's circuit. One interesting feature is the layout of the large AND gates. A 6-input AND gate is a large device, difficult to fit into one cell of the incrementer. The solution is that the gate is spread out across multiple cells. Specifically, the gate uses a standard CMOS NAND gate circuit with NMOS transistors in series and PMOS transistors in parallel. Each cell has an NMOS transistor and a PMOS transistor, and the chains are connected at the end to form the desired NAND gate. (Inverting the output produces the desired AND function.) This spread-out layout technique is unusual, but keeps each bit's circuitry approximately the same size. The incrementer circuitry was tricky to reverse engineer because of these techniques. In particular, most of the prefetcher consists of a single block of circuitry repeated 32 times, once for each bit. The incrementer, on the other hand, consists of four different blocks of circuitry, repeating in an irregular pattern. Specifically, one block starts a carry chain, a second block continues the carry chain, and a third block ends a carry chain. The block before the ending block is different (one large transistor to drive the last block), making four variants in total. This irregular pattern is visible in the earlier photo of the prefetcher. The alignment network The bottom part of the prefetcher rotates data to align it as needed. Unlike some processors, the x86 does not enforce aligned memory accesses. That is, a 32-bit value does not need to start on a 4-byte boundary in memory. As a result, a 32-bit value may be split across two 32-bit rows of the prefetch queue. Moreover, when the instruction decoder fetches one byte of an instruction, that byte may be at any position in the prefetch queue. To deal with these problems, the prefetcher includes an alignment network that can rotate bytes to output a byte, word, or four bytes with the alignment required by the rest of the processor. The diagram below shows part of this alignment network. Each bit exiting the prefetch queue (top) has four wires, for rotates of 24, 16, 8, or 0 bits. Each rotate wire is connected to one of the 32 horizontal bit lines. Finally, each horizontal bit line has an output tap, going to the datapath below. (The vertical lines are in the chip's lower M1 metal layer, while the horizontal lines are in the upper M2 metal layer. For this photo, I removed the M2 layer to show the underlying layer. Shadows of the original horizontal lines are still visible.) Part of the alignment network. The idea is that by selecting one set of vertical rotate lines, the 32-bit output from the prefetch queue will be rotated left by that amount. For instance, to rotate by 8, bits are sent down the "rotate 8" lines. Bit 0 from the prefetch queue will energize horizontal line 8, bit 1 will energize horizontal line 9, and so forth, with bit 31 wrapping around to horizontal line 7. Since horizontal bit line 8 is connected to output 8, the result is that bit 0 is output as bit 8, bit 1 is output as bit 9, and so forth. The four possibilities for aligning a 32-bit value. The four bytes above are shifted as specified to produce the desired output below. For the alignment process, one 32-bit output may be split across two 32-bit entries in the prefetch queue in four different ways, as shown above. These combinations are implemented by multiplexers and drivers. Two 32-bit multiplexers select the two relevant rows in the prefetch queue (blue and green above). Four 32-bit drivers are connected to the four sets of vertical lines, with one set of drivers activated to produce the desired shift. Each byte of each driver is wired to achieve the alignment shown above. For instance, the rotate-8 driver gets its top byte from the "green" multiplexer and the other three bytes from the "blue" multiplexer. The result is that the four bytes, split across two queue rows, are rotated to form an aligned 32-bit value. Sign extension The final circuit is sign extension. Suppose you want to add an 8-bit value to a 32-bit value. An unsigned 8-bit value can be extended to 32 bits by simply filling the upper bits with zeroes. But for a signed value, it's trickier. For instance, -1 is the eight-bit value 0xFF, but the 32-bit value is 0xFFFFFFFF. To convert an 8-bit signed value to 32 bits, the top 24 bits must be filled in with the top bit of the original value (which indicates the sign). In other words, for a positive value, the extra bits are filled with 0, but for a negative value, the extra bits are filled with 1. This process is called sign extension.9 In the 386, a circuit at the bottom of the prefetcher performs sign extension for values in instructions. This circuit supports extending an 8-bit value to 16 bits or 32 bits, as well as extending a 16-bit value to 32 bits. This circuit will extend a value with zeros or with the sign, depending on the instruction. The schematic below shows one bit of this sign extension circuit. It consists of a latch on the left and right, with a multiplexer in the middle. The latches are constructed with a standard 386 circuit using a CMOS switch (see footnote).7 The multiplexer selects one of three values: the bit value from the swap network, 0 for sign extension, or 1 for sign extension. The multiplexer is constructed from a CMOS switch if the bit value is selected and two transistors for the 0 or 1 values. This circuit is replicated 32 times, although the bottom byte only has the latches, not the multiplexer, as sign extension does not modify the bottom byte. The sign extend circuit associated with bits 31-8 from the prefetcher. The second part of the sign extension circuitry determines if the bits should be filled with 0 or 1 and sends the control signals to the circuit above. The gates on the left determine if the sign extension bit should be a 0 or a 1. For a 16-bit sign extension, this bit comes from bit 15 of the data, while for an 8-bit sign extension, the bit comes from bit 7. The four gates on the right generate the signals to sign extend each bit, producing separate signals for the bit range 31-16 and the range 15-8. This circuit determines which bits should be filled with 0 or 1. The layout of this circuit on the die is somewhat unusual. Most of the prefetcher circuitry consists of 32 identical columns, one for each bit.8 The circuitry above is implemented once, using about 16 gates (buffers and inverters are not shown above). Despite this, the circuitry above is crammed into bit positions 17 through 7, creating irregularities in the layout. Moreover, the implementation of the circuitry in silicon is unusual compared to the rest of the 386. Most of the 386's circuitry uses the two metal layers for interconnection, minimizing the use of polysilicon wiring. However, the circuit above also uses long stretches of polysilicon to connect the gates. Layout of the sign extension circuitry. This circuitry is at the bottom of the prefetch queue. The diagram above shows the irregular layout of the sign extension circuitry amid the regular datapath circuitry that is 32 bits wide. The sign extension circuitry is shown in green; this is the circuitry described at the top of this section, repeated for each bit 31-8. The circuitry for bits 15-8 has been shifted upward, perhaps to make room for the sign extension control circuitry, indicated in red. Note that the layout of the control circuitry is completely irregular, since there is one copy of the circuitry and it has no internal structure. One consequence of this layout is the wasted space to the left and right of this circuitry block, the tan regions with no circuitry except vertical metal lines passing through. At the far right, a block of circuitry to control the latches has been wedged under bit 0. Intel's designers go to great effort to minimize the size of the processor die since a smaller die saves substantial money. This layout must have been the most efficient they could manage, but I find it aesthetically displeasing compared to the regularity of the rest of the datapath. How instructions flow through the chip Instructions follow a tortuous path through the 386 chip. First, the Bus Interface Unit in the upper right corner reads instructions from memory and sends them over a 32-bit bus (blue) to the prefetch unit. The prefetch unit stores the instructions in the 16-byte prefetch queue. Instructions follow a twisting path to and from the prefetch queue. How is an instruction executed from the prefetch queue? It turns out that there are two distinct paths. Suppose you're executing an instruction to add 12345678 to the EAX register. The prefetch queue will hold the five bytes 05 (the opcode), 78, 56, 34, and 12. The prefetch queue provides opcodes to the decoder one byte at a time over the 8-bit bus shown in red. The bus takes the lowest 8 bits from the prefetch queue's alignment network and sends this byte to a buffer (the small square at the head of the red arrow). From there, the opcode travels to the instruction decoder.10 The instruction decoder, in turn, uses large tables (PLAs) to convert the x86 instruction into a 111-bit internal format with 19 different fields.11 The data bytes of an instruction, on the other hand, go from the prefetch queue to the ALU (Arithmetic Logic Unit) through a 32-bit data bus (orange). Unlike the previous buses, this data bus is spread out, with one wire through each column of the datapath. This bus extends through the entire datapath so values can also be stored into registers. For instance, the MOV (move) instruction can store a value from an instruction (an "immediate" value) into a register. Conclusions The 386's prefetch queue contains about 7400 transistors, more than an Intel 8080 processor. (And this is just the queue itself; I'm ignoring the prefetch control logic.) This illustrates the rapid advance of processor technology: part of one functional unit in the 386 contains more transistors than an entire 8080 processor from 11 years earlier. And this unit is less than 3% of the entire 386 processor. Every time I look at an x86 circuit, I see the complexity required to support backward compatibility, and I gain more understanding of why RISC became popular. The prefetcher is no exception. Much of the complexity is due to the 386's support for unaligned memory accesses, requiring a byte shift network to move bytes into 32-bit alignment. Moreover, at the other end of the instruction bus is the complicated instruction decoder that decodes intricate x86 instructions. Decoding RISC instructions is much easier. In any case, I hope you've found this look at the prefetch circuitry interesting. I plan to write more about the 386, so follow me on Bluesky (@righto.com) or RSS for updates. I've written multiple articles on the 386 previously; a good place to start might be my survey of the 368 dies. Footnotes and references The width of the circuitry for one bit changes a few times: while the prefetch queue and segment descriptor cache use a circuit that is 66 µm wide, the datapath circuitry is a bit tighter at 60 µm. The barrel shifter is even narrower at 54.5 µm per bit. Connecting circuits with different widths wastes space, since the wiring to connect the bits requires horizontal segments to adjust the spacing. But it also wastes space to use widths that are wider than needed. Thus, changes in the spacing are rare, where the tradeoffs make it worthwhile. ↩ The Intel 8086 processor had a six-byte prefetch queue, while the Intel 8088 (used in the original IBM PC) had a prefetch queue of just four bytes. In comparison, the 16-byte queue of the 386 seems luxurious. (Some 386 processors, however, are said to only use 12 bytes due to a bug.) The prefetch queue assumes instructions are executed in linear order, so it doesn't help with branches or loops. If the processor encounters a branch, the prefetch queue is discarded. (In contrast, a modern cache will work even if execution jumps around.) Moreover, the prefetch queue doesn't handle self-modifying code. (It used to be common for code to change itself while executing to squeeze out extra performance.) By loading code into the prefetch queue and then modifying instructions, you could determine the size of the prefetch queue: if the old instruction was executed, it must be in the prefetch queue, but if the modified instruction was executed, it must be outside the prefetch queue. Starting with the Pentium Pro, x86 processors flush the prefetch queue if a write modifies a prefetched instruction. ↩ The prefetch unit generates "linear" addresses that must be translated to physical addresses by the paging unit (ref). ↩ I don't know which phase of the clock is phase 1 and which is phase 2, so I've assigned the numbers arbitrarily. The 386 creates four clock signals internally from a clock input CLK2 that runs at twice the processor's clock speed. The 386 generates a two-phase clock with non-overlapping phases. That is, there is a small gap between when the first phase is high and when the second phase is high. The 386's circuitry is controlled by the clock, with alternate blocks controlled by alternate phases. Since the clock phases don't overlap, this ensures that logic blocks are activated in sequence, allowing the orderly flow of data. But because the 386 uses CMOS, it also needs active-low clocks for the PMOS transistors. You might think that you could simply use the phase 1 clock as the active-low phase 2 clock and vice versa. The problem is that these clock phases overlap when used as active-low; there are times when both clock signals are low. Thus, the two clock phases must be explicitly inverted to produce the two active-low clock phases. I described the 386's clock generation circuitry in detail in this article. ↩ The Manchester carry chain is typically used in an adder, which makes it more complicated than shown here. In particular, a new carry can be generated when two 1 bits are added. Since we're looking at an incrementer, this case can be ignored. The Manchester carry chain was first described in Parallel addition in digital computers: a new fast ‘carry’ circuit. It was developed at the University of Manchester in 1959 and used in the Atlas supercomputer. ↩ For some reason, the incrementer uses a completely different XOR circuit from the comparator, built from a multiplexer instead of logic. In the circuit below, the two CMOS switches form a multiplexer: if the first input is 1, the top switch turns on, while if the first input is a 0, the bottom switch turns on. Thus, if the first input is a 1, the second input passes through and then is inverted to form the output. But if the first input is a 0, the second input is inverted before the switch and then is inverted again to form the output. Thus, the second input is inverted if the first input is 1, which is a description of XOR. The implementation of an XOR gate in the incrementer. I don't see any clear reason why two different XOR circuits were used in different parts of the prefetcher. Perhaps the available space for the layout made a difference. Or maybe the different circuits have different timing or output current characteristics. Or it could just be the personal preference of the designers. ↩ The latch circuit is based on a CMOS switch (or transmission gate) and a weak inverter. Normally, the inverter loop holds the bit. However, if the CMOS switch is enabled, its output overpowers the signal from the weak inverter, forcing the inverter loop into the desired state. The CMOS switch consists of an NMOS transistor and a PMOS transistor in parallel. By setting the top control input high and the bottom control input low, both transistors turn on, allowing the signal to pass through the switch. Conversely, by setting the top input low and the bottom input high, both transistors turn off, blocking the signal. CMOS switches are used extensively in the 386, to form multiplexers, create latches, and implement XOR. ↩ Most of the 386's control circuitry is to the right of the datapath, rather than awkwardly wedged into the datapath. So why is this circuit different? My hypothesis is that since the circuit needs the values of bit 15 and bit 7, it made sense to put the circuitry next to bits 15 and 7; if this control circuitry were off to the right, long wires would need to run from bits 15 and 7 to the circuitry. ↩ In case this post is getting tedious, I'll provide a lighter footnote on sign extension. The obvious mnemonic for a sign extension instruction is SEX, but that mnemonic was too risque for Intel. The Motorola 6809 processor (1978) used this mnemonic, as did the related 68HC12 microcontroller (1996). However, Steve Morse, architect of the 8086, stated that the sign extension instructions on the 8086 were initially named SEX but were renamed before release to the more conservative CBW and CWD (Convert Byte to Word and Convert Word to Double word). The DEC PDP-11 was a bit contradictory. It has a sign extend instruction with the mnemonic SXT; the Jargon File claims that DEC engineers almost got SEX as the assembler mnemonic, but marketing forced the change. On the other hand, SEX was the official abbreviation for Sign Extend (see PDP-11 Conventions Manual, PDP-11 Paper Tape Software Handbook) and SEX was used in the microcode for sign extend. RCA's CDP1802 processor (1976) may have been the first with a SEX instruction, using the mnemonic SEX for the unrelated Set X instruction. See also this Retrocomputing Stack Exchange page. ↩ It seems inconvenient to send instructions all the way across the chip from the Bus Interface Unit to the prefetch queue and then back across to the chip to the instruction decoder, which is next to the Bus Interface Unit. But this was probably the best alternative for the layout, since you can't put everything close to everything. The 32-bit datapath circuitry is on the left, organized into 32 columns. It would be nice to put the Bus Interface Unit other there too, but there isn't room, so you end up with the wide 32-bit data bus going across the chip. Sending instruction bytes across the chip is less of an impact, since the instruction bus is just 8 bits wide. ↩ See "Performance Optimizations of the 80386", Slager, Oct 1986, in Proceedings of ICCD, pages 165-168. ↩

2 days ago 4 votes
Code Matters

It looks like the code that the newly announced Figma Sites is producing isn’t the best. There are some cool Figma-to-WordPress workflows; I hope Sites gets more people exploring those options.

3 days ago 5 votes
What got you here…

John Siracusa: Apple Turnover From virtue comes money, and all other good things. This idea rings in my head whenever I think about Apple. It’s the most succinct explanation of what pulled Apple from the brink of bankruptcy in the 1990s to its astronomical success today. Don’

3 days ago 3 votes