Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
11
So we have eduroam at our University and unsurprisingly wicd is not on their official support list but with some googling the wonderful Arch Wiki had the answer. Well, almost. Save the following as /etc/wicd/encryption/templates/ttls-80211: name = TTLS for Wireless author = Alexander Clouter version = 1 require anon_identity *Anonymous_Username identity *Identity password *Password optional ca_cert *Path_to_CA_Cert ----- ctrl_interface=/var/run/wpa_supplicant network={ ssid="$_ESSID" scan_ssid=$_SCAN key_mgmt=WPA-EAP eap=TTLS ca_cert="$_CA_CERT" phase2="auth=MSCHAPv2 auth=PAP" anonymous_identity="$_ANON_IDENTITY" identity="$_IDENTITY" password="$_PASSWORD" } Only difference from the wiki is the line subject_match="$_CERT_SUBJECT" is removed. In a terminal: cd /etc/wicd/encryption/templates echo ttls-80211 >> active Then open wicd (I use wicd-curses) and choose TTLS for Wireless under the security mode and enter your credentials from this page....
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Jonas Hietala

Some VORON 0 mods

I recently completed my VORON 0 build and I was determined to leave it as-is for a while and to start modding my VORON Trident… So before embarking om my larger Trident modding journey I decided to work on the VORON 0 just a little bit more. HEPA filter With the Nevermore Micro V4 I had active carbon filtering but I also wanted a HEPA filter that would also provide negative air pressure to the printer. I found the Hepa filter by JNP for the VORON 0.1 and a mount for the VORON 0.2 that I installed. For the fans I used two Noctua NF-A4x10 FLX fans and I spliced them together with the Nevermore filter, allowing the MCU to control all the filter fans together. It might have been better to buy the 5V versions and connect them to the 5V output to have them always on, but by then I had already ordered the other version. Oh well. Back meshed panel The small 5V fan for the Raspberry Pi was super loud and I wanted to replace it with something. Because the Raspberry Pi Zero doesn’t get that hot I removed the fan and replaced the back panel with a meshed variant, which I hope should provide enough airflow to keep the electronics cool. (There are other variants with integrated fans if I realize this wasn’t enough.) Modesty mesh The wiring is super ugly and I stumbled upon the modesty mesh that hides the wires well from the sides. Not at all necessary but they make the printer a little prettier. Full size panels One thing that bothered me with the stock VORON 0.2 was the gaps between the tophat and the side panels and front door. I went looking for a mod with fill-sized panels and found the ZeroPanels mod. Instead of magnets the printed parts clips into the extrusions pretty hard while still allowing you to pull them off when you want to. It works really well honestly. The clips were slightly difficult to print but manageable. I was looking at the BoxZero mod for a proper full-sized panels mod but I didn’t want to tear apart the printer and rebuild the belt path so I simply replaced the stock panels with full sized ones. This does leave some air gaps at the back and front of the printer right next to the belt that I simply covered with some tape: Some tape to cover the gaps around the belts. While the clips are good for panels you don’t remove that often, they’re too much to use for the front door. They have some magnetic clips you can use but I’m honestly perplexed on how to use them for good effect. The standard VORON 0 handles don’t consider the extra 3mm the foam tape adds, leaving a gap that severely reduces the pulling force of the magnets. Similarly the magnet clips included in ZeroPanels surprisingly have the same issue. For the door handle I used the stealth handle found in the Voron 0.2 fullsize ZeroPanel mod that does take the foam tape into consideration. Three different magnet holders; at the top the Stealth handles holders that come out 3mm, in the middle the 6mm holder, and at the bottom the standard magnet holder. There’s a variant of the clips for 6mm magnets in the pull requests that I used by pushing in two 3x2mm magnets and super gluing one 10x3mm magnet on top, so it sticks out the 3mm extra distance the foam tape adds. (Yes, maybe just the 10x3mm magnet would be enough). For the outside I used the standard ZeroPanels holders for 10x3mm magnets, allowing the magnets close really tightly against each other. Extra magnets at the top of the printer to get a proper seal. The panels I bought were just slightly too wide causing the side panels to bend a little and it made it hard to get a close seal for front and side panels. I had to file down the clips on the front door to avoid them from colliding with the side panel clips, and I had to add extra clips and magnets for the panels to close tightly against the foam tape.

a week ago 3 votes
Let's build a VORON 0

About 1.5 years ago I ventured into 3D printing by building a VORON Trident. It was a very fun project and I’ve even used the printer quite a bit. Naturally, I had to build another one and this time I opted for the cute VORON 0. Why another printer? I really like my VORON Trident and it’ll continue to be my main printer for the foreseeable future but a second printer would do two important things for me: Act as a backup printer if my Trident breaks. A printer made partially of printed parts is great as you can easily repair it… But only if you have a working printer to print the parts. It would also be very annoying if I disassemble the printer because I want to mod it and realize I’ve forgotten to print a part I needed. Building printers are really fun. Building the VORON Trident is one of my most fun and rewarding projects I’ve done. Why a VORON 0? These properties makes the VORON 0 an ideal secondary printer for me: You need to assemble the VORON 0 yourself (a feature not a bug) Prints ABS/ASA well (for printer parts) Very moddable and truly open source It’s tiny The VORON 0 to the left and the VORON Trident 250 to the right. It’s really small, which is perfect for me as I have a limited amount of space. It would be very fun to build a VORON 2.4 (or even a VORON Phoenix) but I really don’t have space for more printers. Getting the parts I opted to buy a kit instead of self-sourcing the parts as it’s usually cheaper and requires a lot less work, even if you replace some parts. This is what I ended up getting: A VORON 0 kit from Lecktor Parts for a Dragon Burner toolhead Parts for a Nevermore V4 active carbon filter Later on, I replaced the SKR Mini E3 V2 that came with the kit with the V3 Lots of delays I ordered a VORON 0 from Lecktor in February 2024 and it took roughly 4 months before I got the first shipment of parts and it wasn’t until the end of 2024 that I had received all the parts needed to complete the build. The wait was annoying… While I can’t complain about the quality of parts, with the massive delays I regret ordering from Lecktor and in hindsight I should’ve ordered an LDO kit from 3DJake, like I was first considering. Printing parts myself So what do you do when you can’t start the build? You print parts! A box of some of the printed parts for the build (and many I later threw away). There’s something very satisfying with printing parts you then build a printer with. This time I wanted to make a colorful printer and I came up with this mix of filament: PolyLite ASA Yellow Formfutura EasyFil ABS Light Green Formfutura EasyFil ABS Light Blue Formfutura EasyFil ABS Magenta I think they made the printer look great. The build I won’t do as detailed of a build log as I did when building the VORON Trident but I tried to take some pictures. Scroll on! Frames and bed The linear Y-rails. The kit comes with the Kirigami bed mod. The frame with A/B motors. Building the bottom of the printer with feet, power supply, and display. MGN9 instead of MGN7 X-axis After I assembled the X-axis I noticed a problem: The carriage collides with the stock A drive. The reason is that the kit comes with MGN9 rails for the X-axis instead of the standard MGN7 rails. This required me to reprint modified A/B drives, X-carriage, and alignment tools. The carriage passes the modded B drive. Belts Starting to install the belt. The belt is tight. Dragon Burner toolhead I got the parts needed to build the standard mini stealthburner… But I’m attracted to playing around with new stuff and I decided to try out the Dragon Burner instead. I went with it because it’s quite popular, it has good cooling (I print a bunch of PLA), and I haven’t tried it out yet. The fans are inserted. I don’t care about LEDs so I inserted an opaque magenta part instead. I think it looks really good. The back of the Dragon Burner. I opted for the Rapido 2 instead of the Dragon that came with the kit because the Dragon has problems printing PLA. I was a bit confused on how to route the wires as there was very little space when mounting the toolhead on the carriage. Routing the wires close to the fans, clipping off the ears of the fans, and holding together it with cable ties in this way worked for me. Galileo 2 standalone Dragon Burner together with the Galileo 2 extruder mounted on the printer. For the extruder I opted for the standalone version of Galileo 2. I’ve used Galileo 2 on the Trident but I hated the push down latch it uses in the Stealthburner configuration. The latch eventually broke by pulling out a heat-set insert so I went back to the Clockwork 2 on the Trident, giving me the parts to rebuilt the Galileo for the VORON 0 in a standalone configuration. The parts for Galileo 2. There will be left-overs from the Stealthburner variant. The build was really fast and simple—compared to the Stealthburner variant it’s night and day. I didn’t even think to take a break for pictures. Nevermore filter Since I want to be able to print ABS I feel I need to have an activated carbon filter. I wanted to have an exhaust fan with a HEPA filter as well, but I’ll leave that to a mod in the future. The Nevermore V4 is an activated carbon filter that fits well in the VORON 0. I fastened the fan using a strip of VHB—it was a struggle to position it in the middle. The Nevermore is mounted standing in the side of the printer. Just remember to preload the extrusion with extra M3 nuts when you assemble the printer. (I’ve heard LDO has nuts you can insert after… Sounds great.) Panels With the panel and spool holder at the back. Please ignore the filament path in this picture, it’ll interfere with the rear belt when routed behind the umbilical cable. With the tophat and door installed. I’m slightly annoyed with the small gaps and holes the printer has (mainly between the tophat and the panels at the bottom half). I later changed some of the parts related to the top hat to match the colorscheme better. Wiring Wiring was simpler than for the Trident but it was harder to make the wiring pretty. Thank god I could cover it up. The underside of the printer with the power, 5V converter, display, and Z-motor. Back of the printer with the Raspberry Pi and MCU. Raspberry Pi The Raspberry Pi only has two cables; power and communication over the GPIO pins and a display via USB. The Pi communicates and gets power over the TFT connection on the MCU. Toolhead The kit came with a toolhead board and breakout board for an umbilical setup: The toolhead board. The breakout board. I did run into an issue where the polarity of the fans on the toolhead board did not match the polarity of the fans on the MCU, leading to some frustration where the fans refused to spin. I ended up swapping the polarity using the cables from the breakout board to the MCU. Chamber thermistor The MCU only has two thermistor ports and they’re used for the hotend and bed thermistors. For the chamber thermistor (that’s integrated into the breakout board) I use the MOSI pin on the SPI1 8-pin header: The chamber thermistor connected to MOSI and ground on the SPI1 header. SKR mini E3 v3 I got an SKR mini E3 v2 with the kit but I replaced it with the v3 for two reasons: FAN output, used for the Nevermore Filter A filament runout sensor There’s not much to say about the extra FAN output but the filament runout sensor has 3 pins, while VORON 0.2 style runout sensor has 3 pins. I reused the prepared y-endstop I got with the kit, scratched away some of the plastic to make the 2-pin connection fit the 3-pins on the MCU (the +5V pin isn’t needed): The filament runout sensor connected to E0-stop. Klipper setup I followed the VORON documentation and chose Mainsail as I’ve been happy with it on my Trident. I’m not going to describe everything and only call out some issues I had or extra steps I had to take. MCU firmware The VORON documentation assumes USB communication so the default firmware instructions didn’t work for me. According to BigTreeTech’s documentation if you communicate over USART2 (the TFT port) then you need to compile the firmware with Communication interface set to Serial (on USART2 PA3/PA2). You then need to use this klipper configuration: [mcu] serial: /dev/ttyAMA0 restart_method: command It took a long time for me to figure out as I had a display connected via USB, so I thought the display was the MCU and got stuck at a Your Klipper version is: xxx MCU(s) which should be updated: xxx error. Filament runout [filament_switch_sensor Filament_Runout_Sensor] pause_on_runout: True runout_gcode: PAUSE switch_pin: PC15 Chamber thermistor According to this comment this is the config to use the SPI header for a thermistor: [temperature_sensor chamber_temp] sensor_type: Generic 3950 sensor_pin: PA7 pullup_resistor: 10000 Works for me™ Display It’s easy to flash the display directly from the Raspberry Pi although the first firmware I built was too large. There are optional features you can remove but I removed too many so the configuration for the buttons wasn’t accepted. These were the features that ended up working for me: [*] Support GPIO "bit-banging" devices [*] Support LCD devices [ ] Support thermocouple MAX sensors [ ] Support adxl accelerometers [ ] Support lis2dw and lis3dh 3-axis accelerometers [ ] Support MPU accelerometers [*] Support HX711 and HX717 ADC chips [ ] Support ADS 1220 ADC chip [ ] Support ldc1612 eddy current sensor [ ] Support angle sensors [*] Support software based I2C "bit-banging" [*] Support software based SPI "bit-banging" Sensorless homing I was nervous setting up sensorless homing, fearing that without a physical switch the printer might decide to burn the motor against the edge or something. (I really have no idea how it works, hence my fear.) In the end it was straightforward. The VORON 0 example firmware was already configured for sensorless homing and the only things I had to do was: X-DIAG and Y-DIAG pins on the board Tweak the driver_SGTHRS values (I landed on 85 down from 255) And now I have sensorless homing working consistently. What confused me was that the sensorless homing guide and the homing macros it links to were slightly different from the VORON 0 example firmware and it wasn’t clear if I had to make all the changes or not. (I did not.) Some random issues I encountered In typical 3D printer fashion, you’ll always run into various issues, for example: I got the mcu shutdown: Timer too close error a few times. I don’t know what I did but it only happened a couple of times at beginning. The filament sensor had some consistency issues. Some extra tape on the bearing seemed to fix it. The filament keeps getting stuck in the extruder after unload. I’m still having issues but forgetting to tighten the nozzle and using a too short PTFE tube didn’t help. I had trouble getting the filament to stick to bed. Super frustrating to be honest. I re-calibrated the z offset and thumb screws a bunch of times and (right now) it seems to work fairly well. Even though you’re not supposed to need automatic bed leveling for a printer this small, I can’t help but miss the “just works” feeling I have with the Trident. Initial thoughts on the printer A model I printed for one of my kids. It came out really well. I haven’t printed that much with the printer yet but I have some positive things to say about it: Dragon Burner is great when printing PLA (which I use a lot). But I have some negative things to say too: horribly loud but the print movement is also too loud for my taste. It’s poorly insulated. For example there are gaps between the top hat and the rest of the printer that I don’t see a good way to cover up. Overall though I’m very happy with it. I wouldn’t recommend it as a first printer or to someone who just wants a tool that works out of the box, but for people like me who wanted to build a backup/secondary printer I think it’s great. What’s next? With a secondary printer finally up and running I can now start working on some significant mods for my Trident! This is the tentative plan right now: Inverted electronics mod. Replace Stealthburner with another toolhead, most likely A4T-toolhead. Build a BoxTurtle for multi-color support. But we’ll see when I manage to get to it. I’m not in a rush and I should take a little break and play with my VORON 0 and perhaps work on my other dozen or so projects that lie dormant.

a month ago 23 votes
I'll give up Neovim when you pry it from my cold, dead hands

I recently came upon a horror story where a developer was forced to switch editor from Neovim to Cursor and I felt I had to write a little to cleanse myself of the disgust I felt. Two different ways of approaching an editor I think that there’s two opposing ways of thinking about the tool that is an editor: Refuse to personalize anything and only use the basic features “An editor is a simple tool I use to get the job done.” Get stuck in configuration hell and spend tons of time tweaking minor things “An editor is a highly personalized tool that works the way I want.” These are the extreme ends of the spectrum to make a point and most developers will fall somewhere in between. It’s not a static proposition; I’ve had periods in my life where I’ve used the same Vim configuration for years and other times I’ve spent more time rewriting my Neovim config than doing useful things. I don’t differentiate between text editors and IDEs as I don’t find the distinction very meaningful. They’re all just editors. Freedom of choice is important Freedom of choice is more to be treasured than any possession earth can give. David O. McKay Some developers want zero configuration while others want to configure their editor so it’s just right. Either way is fine and I’ve met excellent developers from both sides. But removing the power of choice is a horrible idea as you’re forcing developers to work in a way they’re not comfortable with, not productive with, or simply don’t like. You’re bound to make some of the developers miserable or see them leave (usually the best ones who can easily find another job). To explain how important an editor might be to some people, I give you this story about Stephen Hendry—one of the most successful Snooker players ever—and how important his cue was to him: In all the years I’ve been playing I’ve never considered changing my cue. It was the first cue I ever bought, aged 13, picked from a cabinet in a Dunfermline snooker centre just because I liked the Rex Williams signature on it. I saved £40 to buy it. It’s a cheap bit of wood and it’s been the butt of other players’ jokes for ages. Alex Higgins said it was ‘only good for holdin’ up f*g tomatoes!’ But I insist on sticking with it. And I’ve won a lot of silverware, including seven World Championship trophies, with it. It’s a one-piece which I carry in a wooden, leather-bound case that’s much more expensive than the cue it houses. But in 2003, at Glasgow airport after a flight from Bangkok, it emerges through the rubber flaps on the carousel and even at twenty yards I can see that both case and cue are broken. Snapped almost clean in two, the whole thing now resembling some form of shepherd’s crook. The cue comes to where I’m standing, and I pick it up, the broken end dangling down forlornly. I could weep. Instead, I laugh. ‘Well,’ I say to my stunned-looking friend John, ‘that’s my career over.’ Stephen Hendry, The Mirror Small improvements leads to large long-term gains Kaizen isn’t about massive overhauls or overnight success. Instead, it focuses on small, continuous improvements that add up to significant long-term gains. What is Kaizen? A Guide to Continuous Improvement I firmly believe that even small improvements are worth it as they add up over time (also see compound interest and how it relates to financial investments). An editor is a great example where even small improvements may have a big effect for the simple reason that you spend so much time in your editor. I’ve spent hours almost every day inside (neo)vim since I started using it 15+ years ago. Even simple things like quickly changing text inside brackets (ci[) instead of selecting text with your mouse might save hundreds of hours during a programming career—and that’s just one example. Naturally, as a developer you can find small but worthwhile improvements in other areas too, for instance: Learning the programming languages and libraries you use a little better Customizing your keyboard and keyboard layout This is more for comfort and health than speed but that makes it even more important, not less. Increasing your typing speed Some people dismiss typing speed as they say they’re limited by their thinking, not typing. But the benefit of typing faster (and more fluidly) isn’t really the overall time spent typing vs thinking; it’s so you can continue thinking with as little interruption as possible. On some level you want to reduce the time typing in this chain: think… edit, think… edit, think… It’s also why the Vim way of editing is so good—it’s based on making small edits and to return quickly to normal (thinking) mode. Some people ask how can you afford to spend time practicing Vim commands or to configure your editor as it takes away time from work? But I ask you: with a programming career of several decades and tens of thousands of hours to spend in front of your computer, how can you afford not to? Neovim is versatile During the years I’ve done different things: Switched keyboard and keyboard layout multiple times. Been blogging and wrote a book. The one constant through all of this has been Neovim. Neovim may not have the best language specific integrations but it does everything well and the benefit of having the same setup for everything you do is not to be underestimated. It pairs nicely with the idea of adding up small improvements over time; every small improvement that I add to my Neovim workflow will stay with me no matter what I work with. I did use Emacs at work for years because their proprietary language only had an Emacs integration and I didn’t have the time nor energy to create one for Neovim. While Evil made the experience survivable I realized then that I absolutely hate having my work setup be different from my setup at home. People weren’t overjoyed with being unable to choose their own editor and I’ve heard rumors that there’s now an extension for Visual Studio. Neovim is easily extensible Neovim: a Personalized Development Environment TJ DeVries A different take on editing code I’ve always felt that Vimscript is the worst part of Vim. Maybe that’s a weird statement as the scriptability of Vim is one if it’s strengths; and to be fair, simple things are very nice: nnoremap j gj set expandtab But writing complex things in Vimscript is simply not a great experience. One of the major benefits of Neovim is the addition of Lua as a first-class scripting language. Yes, Lua isn’t perfect and it’s often too verbose but it’s so much better than Vimscript. Lua is the main reason that the Neovim plugin ecosystem is currently a lot more vibrant than in Vim. Making it easier to write plugins is of course a boon, but the real benefit is in how it makes it even easier to make more complex customization for yourself. Just plop down some Lua in the configuration files you already have and you’re done. (Emacs worked this out to an even greater extent decades ago.) One way I use this customizability is to help me when I’m blogging: Maybe you don’t need to create something this big but even small things such as disabling autoformat for certain file types in specific folders can be incredibly useful. Approachability should not be underestimated. While plugins in Lua is understandably the focus today, Neovim can still use plugins written in Vimscript and 99% of your old Vim configuration will still work in Neovim. Neovim won’t go anywhere The old is expected to stay longer than the young in proportion to their age. Nassim Nicholas Taleb, “Antifragile” The last big benefit with Neovim I’ll highlight—and why I feel fine with investing even more time into Neovim—is that Neovim will most likely continue to exist and thrive for years if not decades to come. While Vim has—after an impressive 30 years of development—recently entered maintenance mode, activity in Neovim has steadily increased since the fork from Vim more than a decade ago. The amount of high quality plugins, interest in Google trends, and GitHub activity have all been trending upwards. Neovim was also the most desired editor according to the latest Stackoverflow developer survey and the overall buzz and excitement in the community is at an all-time high. With the self-reinforced behavior and benefits of investing into a versatile and flexible editor with a huge plugin ecosystem such as Neovim I see no reason for the trend to taper off anytime soon. Neovim will probably never be as popular as something like VSCode but as an open source project backed by excited developers, Neovim will probably be around long after VSCode has been discontinued for The Next Big Thing.

2 months ago 26 votes
Securing my partner's digital life

I’ve been with Veronica for over a decade now and I think I’m starting to know her fairly well. Yet she still manages to surprise me. For instance, a couple of weeks ago she came and asked me about email security: I worry that my email password is too weak. Can you help me change email address and make it secure? It was completely unexpected—but I’m all for it. The action plan All heroic journeys needs a plan; here’s mine: .com surname was available). Migrate her email to Fastmail. Setup Bitwarden as a password manager. Use a YubiKey to secure the important services. Why a domain? If you ever want (or need) to change email providers it’s very nice to have your own domain. For instance, Veronica has a hotmail.com address but she can’t bring that with her if she moves to Fastmail. Worse, what if she gets locked out of her Outlook account for some reason? It might happen if you forget your password, someone breaks into your account, or even by accident. For example, Apple users recently got locked out of their Apple IDs without any apparent reason and Gmail has been notorious about locking out users for no reason. Some providers may be better but this is a systemic problem that can happen at any service. In almost all cases, your email is your key to the rest of your digital life. The email address is your username and to reset your password you use your email. If you lose access to your email you lose everything. When you control your domain, you can point the domain to a new email provider and continue with your life. Why pay for email? One of the first things Veronica told me when I proposed that she’d change providers was that she didn’t want to pay. It’s a common sentiment online that email must be cheap (or even free). I don’t think that email is the area where cost should be the most significant factor. As I argued for in why you should own your email’s domain, your email is your most important digital asset. If email is so important, why try to be cheap about it? You should spend your money on the important things and shouldn’t spend money on the unimportant things. Paying for email gives you a couple of nice things: Human support. It’s all too easy to get shafted by algorithms where you might get banned because you triggered some edge case (such as resetting your password outside your usual IP address). Ability to use your own domain. Having a custom domain is a paid feature at most email providers. A long-term viable business. How do you run an email company if you don’t charge for it? (You sell out your users or you close your business.) Why a password manager? The best thing you can do security wise is to adopt a password manager. Then you don’t have to try to remember dozens of passwords (leading to easy-to-remember and duplicate passwords) and can focus on remembering a single (stronger) password, confident that the password manager will remember all the rest. “Putting all your passwords in one basket” is a concern of course but I think the pros outweigh the cons. Why a YubiKey? To take digital security to the next level you should use two-factor authentication (2FA). 2FA is an extra “thing” in addition to your password you need to be able to login. It could be a code sent to your phone over SMS (insecure), to your email (slightly better), a code from a 2FA app on your phone such as Aegis Authenticator (good), or from a hardware token (most secure). It’s easy to think that I went with a YubiKey because it’s the most secure option; but the biggest reason is that a YubiKey is more convenient than a 2FA app. With a 2FA app you have to whip out your phone, open the 2FA app, locate the correct site, and then copy the TOTP code into the website (quickly, before the code changes). It’s honestly not that convenient, even for someone like me who’s used this setup for years. With a YubiKey you plug it into a USB port and press it when it flashes. Or on the phone you can use NFC. NFC is slightly more annoying compared to plugging it in as you need to move/hold it in a specific spot, yet it’s still preferable to having to jump between apps on the phone. There are hardware keys other than YubiKey of course. I’ve used YubiKey for years and have a good experience. Don’t fix what isn’t broken. The setup Here’s a few quick notes on how I setup her new accounts: Password management with Bitwarden The first thing we did was setup Bitwarden as the password manager for her. I chose the family plan so I can handle the billing. To give her access I installed Bitwarden as: I gave her a YubiKey and registered it with Bitwarden for additional security. As a backup I also registered my own YubiKeys on her account; if she loses her key we still have others she can use. Although it was a bit confusing for her I think she appreciates not having to remember a dozen different passwords and can simply remember one (stronger) password. We can also share passwords easily via Bitwarden (for news papers, Spotify, etc). The YubiKey itself is very user friendly and she hasn’t run into any usability issues. Email on Fastmail With the core security up and running the next step was to change her email: Gave her an email address on Fastmail with her own domain (<firstname>@<lastname>.com). She has a basic account that I manage (there’s a Duo plan that I couldn’t migrate to at this time). I secured the account with our YubiKeys and a generated password stored in Bitwarden. We bolstered the security of her old Hotmail account by generating a new password and registering our YubiKeys. Forward all email from her old Hotmail address to her new address. With this done she has a secure email account with an email address that she owns. As is proper she’s been changing her contact information and changing email address in her other services. It’s a slow process but I can’t be too critical—I still have a few services that use my old Gmail address even though I migrated to my own domain more than a decade ago. Notes on recovery and redundancy It’s great to worry about weak phishing, weak passwords, and getting hacked. But for most people the much bigger risk is to forget your password or lose your second factor auth, and get locked out that way. To reduce the risk of losing access to her accounts we have: YubiKeys for all accounts. The recovery codes for all accounts are written down and secured. My own accounts can recover her Bitwarden and Fastmail accounts via their built-in recovery functionality. Perfect is the enemy of good Some go further than we’ve done here, others do less, and I think that’s fine. It’s important to not compare yourself with others too much; even small security measures makes a big difference in practice. Not doing anything at all because you feel overwhelmed is worse than doing something, even something simple as making sure you’re using a strong password for your email account.

3 months ago 53 votes
First impressions of Ghostty

There are two conflicting forces in play in setting up your computer environment: It’s common to find people get stuck at the extreme ends of the spectrum; some programmers refuse to configure or learn their tools at all, while others get stuck re-configuring their setups constantly without any productivity gains to show for it. Finding a balance can be tricky. With regards to terminals I’ve been using alacritty for many years. It gets the job done but I don’t know if I’m missing out on anything? I’ve been meaning to look at alternatives like wezterm and kitty but I never got far enough to try them out. On one hand it’s just a terminal, what difference could it make? Enter Ghostty, a terminal so hyped up it made me drop any useful things I was working on and see what the fuzz was about. I don’t quite get why people hype up a terminal of all things but here we are. Ghostty didn’t revolutionize my setup or anything but I admit that Ghostty is quite nice and it has replaced alacritty as my terminal. I just want a blank canvas without any decorations One of the big selling points of Ghostty is it’s native platform integration. It’s supposed to integrate well with your window manager so it looks the same and gives you some extra functionality… But I don’t know why I should care—I just want a big square without decorations of any kind. You’re supposed to to be able to simply turn off any window decorations: window-decoration = false At the moment there’s a bug that requires you set some weird GTK settings to fully remove the borders: gtk-titlebar = false gtk-adwaita = false It’s unfortunate as I haven’t done any GKT configuration on my machine (I use XMonad as my window manager and I don’t have any window decorations anywhere). There might some useful native features I don’t know about. The password input style is neat for instance, although I’m not sure it does anything functionally different compared to other terminals: Cursor invert cursor-invert-fg-bg = true In alacritty I’ve had the cursor invert the background and foreground and you can do that in Ghostty too. I ran into an issue where it interferes with indent-blankline.nvim making the cursor very hard to spot in indents (taking the color of the indent guides, which is by design low contrast with the background). Annoying but it gave me the shove I needed to try out different plugins to see if the problem persisted. I ended up with (an even nicer) setup using snacks.nvim that doesn’t hide the cursor: Left: indent-blankline.nvim (cursor barely visible) snacks.nvim (cursor visible and it highlights scope). Minimum contrast Unreadable ls output is a staple of the excellent Linux UX. It might look like this: Super annoying. You can of course configure the ls output colors but that’s just for one program and it won’t automatically follow when you ssh to another server. Ghostty’s minimum-contrast option ensures that the text and background always has enough contrast to be visible: minimum-contrast = 1.05 Most excellent. This feature has the potential to break “eye candy” features, such the Neovim indent lines plugins if you use a low contrast configuration. I still run into minor issues from time to time. Hide cursor while typing mouse-hide-while-typing = true A small quality-of-life feature is the ability to hide the cursor when typing. I didn’t know I needed this in my life. Consistent font sizing between desktop and laptop With alacritty I have an annoying problem where I need to use a very different font size on my laptop and my desktop (8 and 12). This wasn’t always the case and I think something may have changed in alacritty but I’m not sure. Ghostty doesn’t have this problem and I can now use the same font settings across my machines ( font-size = 16 ). Ligature support The issue for adding ligatures to alacritty was closed eight years ago and even though I wanted to try ligatures I couldn’t be bothered to “run a low quality fork”. Ghostty seems like the opposite of “low quality” and it renders Iosevka’s ligatures very well: My configured ligatures of Iosevka, rendered in Ghostty. Overall I feel that the font rendering in Ghostty is a little better than in alacritty, although that might be recency bias. I’m still undecided on ligatures but I love that I don’t have to feel limited by the terminal. I use a custom Iosevka build with these Ghostty settings: font-family = IosevkaTreeLig Nerd Font font-style = Medium font-style-bold = Bold font-style-italic = Medium Italic font-style-bold-italic = Bold Italic font-size = 16 Colorscheme While Ghostty has an absolutely excellent theme selector with a bunch of included themes (ghostty +list-themes) melange-nvim wasn’t included, so I had to configure the colorscheme myself. It was fairly straightforward even though the palette = 0= syntax was a bit surprising: # The dark variant of melange background = #292522 foreground = #ECE1D7 palette = 0=#867462 palette = 1=#D47766 palette = 2=#85B695 palette = 3=#EBC06D palette = 4=#A3A9CE palette = 5=#CF9BC2 palette = 6=#89B3B6 palette = 7=#ECE1D7 palette = 8=#34302C palette = 9=#BD8183 palette = 10=#78997A palette = 11=#E49B5D palette = 12=#7F91B2 palette = 13=#B380B0 palette = 14=#7B9695 palette = 15=#C1A78E # I think it's nice to colorize the selection too selection-background = #403a36 selection-foreground = #c1a78e I’m happy with Ghostty In the end Ghostty has improved my setup and I’m happy I took time to try it out. It took a little more time than “just launch it” but it absolutely wasn’t a big deal. The reward was a few pleasant improvements that have improved my life a little. And perhaps most important of all: I’m now an alpha Nerd that uses a terminal written in Zig. Did I create a custom highlighter for the Ghostty configuration file just to have proper syntax highlighting for this one blog post? You bet I did. (It’s a simple treesitter grammar.)

4 months ago 62 votes

More in technology

Sierpiński triangle? In my bitwise AND?

Exploring a peculiar bit-twiddling hack at the intersection of 1980s geek sensibilities.

yesterday 4 votes
Reverse engineering the 386 processor's prefetch queue circuitry

In 1985, Intel introduced the groundbreaking 386 processor, the first 32-bit processor in the x86 architecture. To improve performance, the 386 has a 16-byte instruction prefetch queue. The purpose of the prefetch queue is to fetch instructions from memory before they are needed, so the processor usually doesn't need to wait on memory while executing instructions. Instruction prefetching takes advantage of times when the processor is "thinking" and the memory bus would otherwise be unused. In this article, I look at the 386's prefetch queue circuitry in detail. One interesting circuit is the incrementer, which adds 1 to a pointer to step through memory. This sounds easy enough, but the incrementer uses complicated circuitry for high performance. The prefetch queue uses a large network to shift bytes around so they are properly aligned. It also has a compact circuit to extend signed 8-bit and 16-bit numbers to 32 bits. There aren't any major discoveries in this post, but if you're interested in low-level circuits and dynamic logic, keep reading. The photo below shows the 386's shiny fingernail-sized silicon die under a microscope. Although it may look like an aerial view of a strangely-zoned city, the die photo reveals the functional blocks of the chip. The Prefetch Unit in the upper left is the relevant block. In this post, I'll discuss the prefetch queue circuitry (highlighted in red), skipping over the prefetch control circuitry to the right. The Prefetch Unit receives data from the Bus Interface Unit (upper right) that communicates with memory. The Instruction Decode Unit receives prefetched instructions from the Prefetch Unit, byte by byte, and decodes the opcodes for execution. This die photo of the 386 shows the location of the registers. Click this image (or any other) for a larger version. The left quarter of the chip consists of stripes of circuitry that appears much more orderly than the rest of the chip. This grid-like appearance arises because each functional block is constructed (for the most part) by repeating the same circuit 32 times, once for each bit, side by side. Vertical data lines run up and down, in groups of 32 bits, connecting the functional blocks. To make this work, each circuit must fit into the same width on the die; this layout constraint forces the circuit designers to develop a circuit that uses this width efficiently without exceeding the allowed width. The circuitry for the prefetch queue uses the same approach: each circuit is 66 µm wide1 and repeated 32 times. As will be seen, fitting the prefetch circuitry into this fixed width requires some layout tricks. What the prefetcher does The purpose of the prefetch unit is to speed up performance by reading instructions from memory before they are needed, so the processor won't need to wait to get instructions from memory. Prefetching takes advantage of times when the memory bus is otherwise idle, minimizing conflict with other instructions that are reading or writing data. In the 386, prefetched instructions are stored in a 16-byte queue, consisting of four 32-bit blocks.2 The diagram below zooms in on the prefetcher and shows its main components. You can see how the same circuit (in most cases) is repeated 32 times, forming vertical bands. At the top are 32 bus lines from the Bus Interface Unit. These lines provide the connection between the datapath and external memory, via the Bus Interface Unit. These lines form a triangular pattern as the 32 horizontal lines on the right branch off and form 32 vertical lines, one for each bit. Next are the fetch pointer and the limit register, with a circuit to check if the fetch pointer has reached the limit. Note that the two low-order bits (on the right) of the incrementer and limit check circuit are missing. At the bottom of the incrementer, you can see that some bit positions have a blob of circuitry missing from others, breaking the pattern of repeated blocks. The 16-byte prefetch queue is below the incrementer. Although this memory is the heart of the prefetcher, its circuitry takes up a relatively small area. A close-up of the prefetcher with the main blocks labeled. At the right, the prefetcher receives control signals. The bottom part of the prefetcher shifts data to align it as needed. A 32-bit value can be split across two 32-bit rows of the prefetch buffer. To handle this, the prefetcher includes a data shift network to shift and align its data. This network occupies a lot of space, but there is no active circuitry here: just a grid of horizontal and vertical wires. Finally, the sign extend circuitry converts a signed 8-bit or 16-bit value into a signed 16-bit or 32-bit value as needed. You can see that the sign extend circuitry is highly irregular, especially in the middle. A latch stores the output of the prefetch queue for use by the rest of the datapath. Limit check If you've written x86 programs, you probably know about the processor's Instruction Pointer (EIP) that holds the address of the next instruction to execute. As a program executes, the Instruction Pointer moves from instruction to instruction. However, it turns out that the Instruction Pointer doesn't actually exist! Instead, the 386 has an "Advance Instruction Fetch Pointer", which holds the address of the next instruction to fetch into the prefetch queue. But sometimes the processor needs to know the Instruction Pointer value, for instance, to determine the return address when calling a subroutine or to compute the destination address of a relative jump. So what happens? The processor gets the Advance Instruction Fetch Pointer address from the prefetch queue circuitry and subtracts the current length of the prefetch queue. The result is the address of the next instruction to execute, the desired Instruction Pointer value. The Advance Instruction Fetch Pointer—the address of the next instruction to prefetch—is stored in a register at the top of the prefetch queue circuitry. As instructions are prefetched, this pointer is incremented by the prefetch circuitry. (Since instructions are fetched 32 bits at a time, this pointer is incremented in steps of four and the bottom two bits are always 0.) But what keeps the prefetcher from prefetching too far and going outside the valid memory range? The x86 architecture infamously uses segments to define valid regions of memory. A segment has a start and end address (known as the base and limit) and memory is protected by blocking accesses outside the segment. The 386 has six active segments; the relevant one is the Code Segment that holds program instructions. Thus, the limit address of the Code Segment controls when the prefetcher must stop prefetching.3 The prefetch queue contains a circuit to stop prefetching when the fetch pointer reaches the limit of the Code Segment. In this section, I'll describe that circuit. Comparing two values may seem trivial, but the 386 uses a few tricks to make this fast. The basic idea is to use 30 XOR gates to compare the bits of the two registers. (Why 30 bits and not 32? Since 32 bits are fetched at a time, the bottom bits of the address are 00 and can be ignored.) If the two registers match, all the XOR values will be 0, but if they don't match, an XOR value will be 1. Conceptually, connecting the XORs to a 32-input OR gate will yield the desired result: 0 if all bits match and 1 if there is a mismatch. Unfortunately, building a 32-input OR gate using standard CMOS logic is impractical for electrical reasons, as well as inconveniently large to fit into the circuit. Instead, the 386 uses dynamic logic to implement a spread-out NOR gate with one transistor in each column of the prefetcher. The schematic below shows the implementation of one bit of the equality comparison. The mechanism is that if the two registers differ, the transistor on the right is turned on, pulling the equality bus low. This circuit is replicated 30 times, comparing all the bits: if there is any mismatch, the equality bus will be pulled low, but if all bits match, the bus remains high. The three gates on the left implement XNOR; this circuit may seem overly complicated, but it is a standard way of implementing XNOR. The NOR gate at the right blocks the comparison except during clock phase 2. (The importance of this will be explained below.) This circuit is repeated 30 times to compare the registers. The equality bus travels horizontally through the prefetcher, pulled low if any bits don't match. But what pulls the bus high? That's the job of the dynamic circuit below. Unlike regular static gates, dynamic logic is controlled by the processor's clock signals and depends on capacitance in the circuit to hold data. The 386 is controlled by a two-phase clock signal.4 In the first clock phase, the precharge transistor below turns on, pulling the equality bus high. In the second clock phase, the XOR circuits above are enabled, pulling the equality bus low if the two registers don't match. Meanwhile, the CMOS switch turns on in clock phase 2, passing the equality bus's value to the latch. The "keeper" circuit keeps the equality bus held high unless it is explicitly pulled low, to avoid the risk of the voltage on the equality bus slowly dissipating. The keeper uses a weak transistor to keep the bus high while inactive. But if the bus is pulled low, the keeper transistor is overpowered and turns off. This is the output circuit for the equality comparison. This circuit is located to the right of the prefetcher. This dynamic logic reduces power consumption and circuit size. Since the bus is charged and discharged during opposite clock phases, you avoid steady current through the transistors. (In contrast, an NMOS processor like the 8086 might use a pull-up on the bus. When the bus is pulled low, would you end up with current flowing through the pull-up and the pull-down transistors. This would increase power consumption, make the chip run hotter, and limit your clock speed.) The incrementer After each prefetch, the Advance Instruction Fetch Pointer must be incremented to hold the address of the next instruction to prefetch. Incrementing this pointer is the job of the incrementer. (Because each fetch is 32 bits, the pointer is incremented by 4 each time. But in the die photo, you can see a notch in the incrementer and limit check circuit where the circuitry for the bottom two bits has been omitted. Thus, the incrementer's circuitry increments its value by 1, so the pointer (with two zero bits appended) increases in steps of 4.) Building an incrementer circuit is straightforward, for example, you can use a chain of 30 half-adders. The problem is that incrementing a 30-bit value at high speed is difficult because of the carries from one position to the next. It's similar to calculating 99999999 + 1 in decimal; you need to tediously carry the 1, carry the 1, carry the 1, and so forth, through all the digits, resulting in a slow, sequential process. The incrementer uses a faster approach. First, it computes all the carries at high speed, almost in parallel. Then it computes each output bit in parallel from the carries—if there is a carry into a position, it toggles that bit. Computing the carries is straightforward in concept: if there is a block of 1 bits at the end of the value, all those bits will produce carries, but carrying is stopped by the rightmost 0 bit. For instance, incrementing binary 11011 results in 11100; there are carries from the last two bits, but the zero stops the carries. A circuit to implement this was developed at the University of Manchester in England way back in 1959, and is known as the Manchester carry chain. In the Manchester carry chain, you build a chain of switches, one for each data bit, as shown below. For a 1 bit, you close the switch, but for a 0 bit you open the switch. (The switches are implemented by transistors.) To compute the carries, you start by feeding in a carry signal at the right The signal will go through the closed switches until it hits an open switch, and then it will be blocked.5 The outputs along the chain give us the desired carry value at each position. Concept of the Manchester carry chain, 4 bits. Since the switches in the Manchester carry chain can all be set in parallel and the carry signal blasts through the switches at high speed, this circuit rapidly computes the carries we need. The carries then flip the associated bits (in parallel), giving us the result much faster than a straightforward adder. There are complications, of course, in the actual implementation. The carry signal in the carry chain is inverted, so a low signal propagates through the carry chain to indicate a carry. (It is faster to pull a signal low than high.) But something needs to make the line go high when necessary. As with the equality circuitry, the solution is dynamic logic. That is, the carry line is precharged high during one clock phase and then processing happens in the second clock phase, potentially pulling the line low. The next problem is that the carry signal weakens as it passes through multiple transistors and long lengths of wire. The solution is that each segment has a circuit to amplify the signal, using a clocked inverter and an asymmetrical inverter. Importantly, this amplifier is not in the carry chain path, so it doesn't slow down the signal through the chain. The Manchester carry chain circuit for a typical bit in the incrementer. The schematic above shows the implementation of the Manchester carry chain for a typical bit. The chain itself is at the bottom, with the transistor switch as before. During clock phase 1, the precharge transistor pulls this segment of the carry chain high. During clock phase 2, the signal on the chain goes through the "clocked inverter" at the right to produce the local carry signal. If there is a carry, the next bit is flipped by the XOR gate, producing the incremented output.6 The "keeper/amplifier" is an asymmetrical inverter that produces a strong low output but a weak high output. When there is no carry, its weak output keeps the carry chain pulled high. But as soon as a carry is detected, it strongly pulls the carry chain low to boost the carry signal. But this circuit still isn't enough for the desired performance. The incrementer uses a second carry technique in parallel: carry skip. The concept is to look at blocks of bits and allow the carry to jump over the entire block. The diagram below shows a simplified implementation of the carry skip circuit. Each block consists of 3 to 6 bits. If all the bits in a block are 1's, then the AND gate turns on the associated transistor in the carry skip line. This allows the carry skip signal to propagate (from left to right), a block at a time. When it reaches a block with a 0 bit, the corresponding transistor will be off, stopping the carry as in the Manchester carry chain. The AND gates all operate in parallel, so the transistors are rapidly turned on or off in parallel. Then, the carry skip signal passes through a small number of transistors, without going through any logic. (The carry skip signal is like an express train that skips most stations, while the Manchester carry chain is the local train to all the stations.) Like the Manchester carry chain, the implementation of carry skip needs precharge circuits on the lines, a keeper/amplifier, and clocked logic, but I'll skip the details. An abstracted and simplified carry-skip circuit. The block sizes don't match the 386's circuit. One interesting feature is the layout of the large AND gates. A 6-input AND gate is a large device, difficult to fit into one cell of the incrementer. The solution is that the gate is spread out across multiple cells. Specifically, the gate uses a standard CMOS NAND gate circuit with NMOS transistors in series and PMOS transistors in parallel. Each cell has an NMOS transistor and a PMOS transistor, and the chains are connected at the end to form the desired NAND gate. (Inverting the output produces the desired AND function.) This spread-out layout technique is unusual, but keeps each bit's circuitry approximately the same size. The incrementer circuitry was tricky to reverse engineer because of these techniques. In particular, most of the prefetcher consists of a single block of circuitry repeated 32 times, once for each bit. The incrementer, on the other hand, consists of four different blocks of circuitry, repeating in an irregular pattern. Specifically, one block starts a carry chain, a second block continues the carry chain, and a third block ends a carry chain. The block before the ending block is different (one large transistor to drive the last block), making four variants in total. This irregular pattern is visible in the earlier photo of the prefetcher. The alignment network The bottom part of the prefetcher rotates data to align it as needed. Unlike some processors, the x86 does not enforce aligned memory accesses. That is, a 32-bit value does not need to start on a 4-byte boundary in memory. As a result, a 32-bit value may be split across two 32-bit rows of the prefetch queue. Moreover, when the instruction decoder fetches one byte of an instruction, that byte may be at any position in the prefetch queue. To deal with these problems, the prefetcher includes an alignment network that can rotate bytes to output a byte, word, or four bytes with the alignment required by the rest of the processor. The diagram below shows part of this alignment network. Each bit exiting the prefetch queue (top) has four wires, for rotates of 24, 16, 8, or 0 bits. Each rotate wire is connected to one of the 32 horizontal bit lines. Finally, each horizontal bit line has an output tap, going to the datapath below. (The vertical lines are in the chip's lower M1 metal layer, while the horizontal lines are in the upper M2 metal layer. For this photo, I removed the M2 layer to show the underlying layer. Shadows of the original horizontal lines are still visible.) Part of the alignment network. The idea is that by selecting one set of vertical rotate lines, the 32-bit output from the prefetch queue will be rotated left by that amount. For instance, to rotate by 8, bits are sent down the "rotate 8" lines. Bit 0 from the prefetch queue will energize horizontal line 8, bit 1 will energize horizontal line 9, and so forth, with bit 31 wrapping around to horizontal line 7. Since horizontal bit line 8 is connected to output 8, the result is that bit 0 is output as bit 8, bit 1 is output as bit 9, and so forth. The four possibilities for aligning a 32-bit value. The four bytes above are shifted as specified to produce the desired output below. For the alignment process, one 32-bit output may be split across two 32-bit entries in the prefetch queue in four different ways, as shown above. These combinations are implemented by multiplexers and drivers. Two 32-bit multiplexers select the two relevant rows in the prefetch queue (blue and green above). Four 32-bit drivers are connected to the four sets of vertical lines, with one set of drivers activated to produce the desired shift. Each byte of each driver is wired to achieve the alignment shown above. For instance, the rotate-8 driver gets its top byte from the "green" multiplexer and the other three bytes from the "blue" multiplexer. The result is that the four bytes, split across two queue rows, are rotated to form an aligned 32-bit value. Sign extension The final circuit is sign extension. Suppose you want to add an 8-bit value to a 32-bit value. An unsigned 8-bit value can be extended to 32 bits by simply filling the upper bits with zeroes. But for a signed value, it's trickier. For instance, -1 is the eight-bit value 0xFF, but the 32-bit value is 0xFFFFFFFF. To convert an 8-bit signed value to 32 bits, the top 24 bits must be filled in with the top bit of the original value (which indicates the sign). In other words, for a positive value, the extra bits are filled with 0, but for a negative value, the extra bits are filled with 1. This process is called sign extension.9 In the 386, a circuit at the bottom of the prefetcher performs sign extension for values in instructions. This circuit supports extending an 8-bit value to 16 bits or 32 bits, as well as extending a 16-bit value to 32 bits. This circuit will extend a value with zeros or with the sign, depending on the instruction. The schematic below shows one bit of this sign extension circuit. It consists of a latch on the left and right, with a multiplexer in the middle. The latches are constructed with a standard 386 circuit using a CMOS switch (see footnote).7 The multiplexer selects one of three values: the bit value from the swap network, 0 for sign extension, or 1 for sign extension. The multiplexer is constructed from a CMOS switch if the bit value is selected and two transistors for the 0 or 1 values. This circuit is replicated 32 times, although the bottom byte only has the latches, not the multiplexer, as sign extension does not modify the bottom byte. The sign extend circuit associated with bits 31-8 from the prefetcher. The second part of the sign extension circuitry determines if the bits should be filled with 0 or 1 and sends the control signals to the circuit above. The gates on the left determine if the sign extension bit should be a 0 or a 1. For a 16-bit sign extension, this bit comes from bit 15 of the data, while for an 8-bit sign extension, the bit comes from bit 7. The four gates on the right generate the signals to sign extend each bit, producing separate signals for the bit range 31-16 and the range 15-8. This circuit determines which bits should be filled with 0 or 1. The layout of this circuit on the die is somewhat unusual. Most of the prefetcher circuitry consists of 32 identical columns, one for each bit.8 The circuitry above is implemented once, using about 16 gates (buffers and inverters are not shown above). Despite this, the circuitry above is crammed into bit positions 17 through 7, creating irregularities in the layout. Moreover, the implementation of the circuitry in silicon is unusual compared to the rest of the 386. Most of the 386's circuitry uses the two metal layers for interconnection, minimizing the use of polysilicon wiring. However, the circuit above also uses long stretches of polysilicon to connect the gates. Layout of the sign extension circuitry. This circuitry is at the bottom of the prefetch queue. The diagram above shows the irregular layout of the sign extension circuitry amid the regular datapath circuitry that is 32 bits wide. The sign extension circuitry is shown in green; this is the circuitry described at the top of this section, repeated for each bit 31-8. The circuitry for bits 15-8 has been shifted upward, perhaps to make room for the sign extension control circuitry, indicated in red. Note that the layout of the control circuitry is completely irregular, since there is one copy of the circuitry and it has no internal structure. One consequence of this layout is the wasted space to the left and right of this circuitry block, the tan regions with no circuitry except vertical metal lines passing through. At the far right, a block of circuitry to control the latches has been wedged under bit 0. Intel's designers go to great effort to minimize the size of the processor die since a smaller die saves substantial money. This layout must have been the most efficient they could manage, but I find it aesthetically displeasing compared to the regularity of the rest of the datapath. How instructions flow through the chip Instructions follow a tortuous path through the 386 chip. First, the Bus Interface Unit in the upper right corner reads instructions from memory and sends them over a 32-bit bus (blue) to the prefetch unit. The prefetch unit stores the instructions in the 16-byte prefetch queue. Instructions follow a twisting path to and from the prefetch queue. How is an instruction executed from the prefetch queue? It turns out that there are two distinct paths. Suppose you're executing an instruction to add 12345678 to the EAX register. The prefetch queue will hold the five bytes 05 (the opcode), 78, 56, 34, and 12. The prefetch queue provides opcodes to the decoder one byte at a time over the 8-bit bus shown in red. The bus takes the lowest 8 bits from the prefetch queue's alignment network and sends this byte to a buffer (the small square at the head of the red arrow). From there, the opcode travels to the instruction decoder.10 The instruction decoder, in turn, uses large tables (PLAs) to convert the x86 instruction into a 111-bit internal format with 19 different fields.11 The data bytes of an instruction, on the other hand, go from the prefetch queue to the ALU (Arithmetic Logic Unit) through a 32-bit data bus (orange). Unlike the previous buses, this data bus is spread out, with one wire through each column of the datapath. This bus extends through the entire datapath so values can also be stored into registers. For instance, the MOV (move) instruction can store a value from an instruction (an "immediate" value) into a register. Conclusions The 386's prefetch queue contains about 7400 transistors, more than an Intel 8080 processor. (And this is just the queue itself; I'm ignoring the prefetch control logic.) This illustrates the rapid advance of processor technology: part of one functional unit in the 386 contains more transistors than an entire 8080 processor from 11 years earlier. And this unit is less than 3% of the entire 386 processor. Every time I look at an x86 circuit, I see the complexity required to support backward compatibility, and I gain more understanding of why RISC became popular. The prefetcher is no exception. Much of the complexity is due to the 386's support for unaligned memory accesses, requiring a byte shift network to move bytes into 32-bit alignment. Moreover, at the other end of the instruction bus is the complicated instruction decoder that decodes intricate x86 instructions. Decoding RISC instructions is much easier. In any case, I hope you've found this look at the prefetch circuitry interesting. I plan to write more about the 386, so follow me on Bluesky (@righto.com) or RSS for updates. I've written multiple articles on the 386 previously; a good place to start might be my survey of the 368 dies. Footnotes and references The width of the circuitry for one bit changes a few times: while the prefetch queue and segment descriptor cache use a circuit that is 66 µm wide, the datapath circuitry is a bit tighter at 60 µm. The barrel shifter is even narrower at 54.5 µm per bit. Connecting circuits with different widths wastes space, since the wiring to connect the bits requires horizontal segments to adjust the spacing. But it also wastes space to use widths that are wider than needed. Thus, changes in the spacing are rare, where the tradeoffs make it worthwhile. ↩ The Intel 8086 processor had a six-byte prefetch queue, while the Intel 8088 (used in the original IBM PC) had a prefetch queue of just four bytes. In comparison, the 16-byte queue of the 386 seems luxurious. (Some 386 processors, however, are said to only use 12 bytes due to a bug.) The prefetch queue assumes instructions are executed in linear order, so it doesn't help with branches or loops. If the processor encounters a branch, the prefetch queue is discarded. (In contrast, a modern cache will work even if execution jumps around.) Moreover, the prefetch queue doesn't handle self-modifying code. (It used to be common for code to change itself while executing to squeeze out extra performance.) By loading code into the prefetch queue and then modifying instructions, you could determine the size of the prefetch queue: if the old instruction was executed, it must be in the prefetch queue, but if the modified instruction was executed, it must be outside the prefetch queue. Starting with the Pentium Pro, x86 processors flush the prefetch queue if a write modifies a prefetched instruction. ↩ The prefetch unit generates "linear" addresses that must be translated to physical addresses by the paging unit (ref). ↩ I don't know which phase of the clock is phase 1 and which is phase 2, so I've assigned the numbers arbitrarily. The 386 creates four clock signals internally from a clock input CLK2 that runs at twice the processor's clock speed. The 386 generates a two-phase clock with non-overlapping phases. That is, there is a small gap between when the first phase is high and when the second phase is high. The 386's circuitry is controlled by the clock, with alternate blocks controlled by alternate phases. Since the clock phases don't overlap, this ensures that logic blocks are activated in sequence, allowing the orderly flow of data. But because the 386 uses CMOS, it also needs active-low clocks for the PMOS transistors. You might think that you could simply use the phase 1 clock as the active-low phase 2 clock and vice versa. The problem is that these clock phases overlap when used as active-low; there are times when both clock signals are low. Thus, the two clock phases must be explicitly inverted to produce the two active-low clock phases. I described the 386's clock generation circuitry in detail in this article. ↩ The Manchester carry chain is typically used in an adder, which makes it more complicated than shown here. In particular, a new carry can be generated when two 1 bits are added. Since we're looking at an incrementer, this case can be ignored. The Manchester carry chain was first described in Parallel addition in digital computers: a new fast ‘carry’ circuit. It was developed at the University of Manchester in 1959 and used in the Atlas supercomputer. ↩ For some reason, the incrementer uses a completely different XOR circuit from the comparator, built from a multiplexer instead of logic. In the circuit below, the two CMOS switches form a multiplexer: if the first input is 1, the top switch turns on, while if the first input is a 0, the bottom switch turns on. Thus, if the first input is a 1, the second input passes through and then is inverted to form the output. But if the first input is a 0, the second input is inverted before the switch and then is inverted again to form the output. Thus, the second input is inverted if the first input is 1, which is a description of XOR. The implementation of an XOR gate in the incrementer. I don't see any clear reason why two different XOR circuits were used in different parts of the prefetcher. Perhaps the available space for the layout made a difference. Or maybe the different circuits have different timing or output current characteristics. Or it could just be the personal preference of the designers. ↩ The latch circuit is based on a CMOS switch (or transmission gate) and a weak inverter. Normally, the inverter loop holds the bit. However, if the CMOS switch is enabled, its output overpowers the signal from the weak inverter, forcing the inverter loop into the desired state. The CMOS switch consists of an NMOS transistor and a PMOS transistor in parallel. By setting the top control input high and the bottom control input low, both transistors turn on, allowing the signal to pass through the switch. Conversely, by setting the top input low and the bottom input high, both transistors turn off, blocking the signal. CMOS switches are used extensively in the 386, to form multiplexers, create latches, and implement XOR. ↩ Most of the 386's control circuitry is to the right of the datapath, rather than awkwardly wedged into the datapath. So why is this circuit different? My hypothesis is that since the circuit needs the values of bit 15 and bit 7, it made sense to put the circuitry next to bits 15 and 7; if this control circuitry were off to the right, long wires would need to run from bits 15 and 7 to the circuitry. ↩ In case this post is getting tedious, I'll provide a lighter footnote on sign extension. The obvious mnemonic for a sign extension instruction is SEX, but that mnemonic was too risque for Intel. The Motorola 6809 processor (1978) used this mnemonic, as did the related 68HC12 microcontroller (1996). However, Steve Morse, architect of the 8086, stated that the sign extension instructions on the 8086 were initially named SEX but were renamed before release to the more conservative CBW and CWD (Convert Byte to Word and Convert Word to Double word). The DEC PDP-11 was a bit contradictory. It has a sign extend instruction with the mnemonic SXT; the Jargon File claims that DEC engineers almost got SEX as the assembler mnemonic, but marketing forced the change. On the other hand, SEX was the official abbreviation for Sign Extend (see PDP-11 Conventions Manual, PDP-11 Paper Tape Software Handbook) and SEX was used in the microcode for sign extend. RCA's CDP1802 processor (1976) may have been the first with a SEX instruction, using the mnemonic SEX for the unrelated Set X instruction. See also this Retrocomputing Stack Exchange page. ↩ It seems inconvenient to send instructions all the way across the chip from the Bus Interface Unit to the prefetch queue and then back across to the chip to the instruction decoder, which is next to the Bus Interface Unit. But this was probably the best alternative for the layout, since you can't put everything close to everything. The 32-bit datapath circuitry is on the left, organized into 32 columns. It would be nice to put the Bus Interface Unit other there too, but there isn't room, so you end up with the wide 32-bit data bus going across the chip. Sending instruction bytes across the chip is less of an impact, since the instruction bus is just 8 bits wide. ↩ See "Performance Optimizations of the 80386", Slager, Oct 1986, in Proceedings of ICCD, pages 165-168. ↩

yesterday 4 votes
Code Matters

It looks like the code that the newly announced Figma Sites is producing isn’t the best. There are some cool Figma-to-WordPress workflows; I hope Sites gets more people exploring those options.

2 days ago 5 votes
What got you here…

John Siracusa: Apple Turnover From virtue comes money, and all other good things. This idea rings in my head whenever I think about Apple. It’s the most succinct explanation of what pulled Apple from the brink of bankruptcy in the 1990s to its astronomical success today. Don’

2 days ago 3 votes
A single RGB camera turns your palm into a keyboard for mixed reality interaction

Interactions in mixed reality are a challenge. Nobody wants to hold bulky controllers and type by clicking on big virtual keys one at a time. But people also don’t want to carry around dedicated physical keyboard devices just to type every now and then. That’s why a team of computer scientists from China’s Tsinghua University […] The post A single RGB camera turns your palm into a keyboard for mixed reality interaction appeared first on Arduino Blog.

2 days ago 3 votes