More from Civic Hax
Intro Hi there! In this post, I want to show off a fun little web app I made for visualizing parking tickets in Chicago, but because I've spent so much time on the overall project, I figured I'd share the story that got me to this point. In many ways, this work is the foundation for my interest in public records and transparency, so it has a very special place in my heart. If you're here just to play around with a cool app, click here. Otherwise, I hope you enjoy this post and find it interesting. Also - please don't hesitate to share harsh criticism or suggestions. Enjoy! Project Beginnings Back in 2014, while on vacation, my car was towed for allegedly being in a construction zone. The cost to get the car out was quoted at around $700 through tow fees and storage fees that increase each day by $35. That's obviously a lot of money, so the night I noticed the car was towed, I began looking for ways to get the ticket thrown out in court. After some digging, I found a dataset on data.cityofchicago.org which details every street closure, including closures for construction. As luck would have it, the address that my car was towed at had zero open construction permits, which seemed like a good reason to throw out the fines. The next day I confirmed the lack of permits by calling Chicago's permit office - they mentioned they only thing found was a canceled construction permit by Comed. That same day, I scheduled a court date for the following day. When I arrived in the court room, I was immediately pulled aside by a city employee and was told that my case was being thrown out. They explained that the ticket itself "didn't have enough information", but besides that I wasn't told much! The judge then signed a form that allowed the release of my car at zero cost. At this point, I'd imagine most folk would just walk away with some frustration, but would be overall happy and move on. For whatever reason, though, I'm too stubborn for that - I couldn't get over the fact that Chicago didn't just spend five minutes checking the ticket before I appeared in court. And what bothered me more was that there isn't some sort of sorts automated check. Surely Chicago has some process in place within its $200 million ticketing system to make sure it's not giving out invalid tickets... Spoiler: it doesn't. The more I thought about it, the more I started wondering why Chicago's court system only favors those who have the time and resources to go to court and started wondering if it was possible to automate these checks myself using historical data. The hunch started based on what I'd seen from the recent "Open Gov" movement in to start making data open to the public by default. When I first started, I mostly just worked with some already publicly available datasets previously released by several news organizations. WBEZ in particular released a fairly large dataset of towing data that, while it was a compelling dataset, it unfortunately ultimately wasn't useful for my little project. So instead, I started working on a similar goal to programatically find invalid tickets in Chicago. That work and its research led to my very first FOIA request, where after some trial and error, received the records for 14m parking tickets. Trial Development and Trial Geocoding With the data at hand, I started throwing many, many unix one liners and gnuplot at the problem. This method worked well enough for small one-off checks, but for anything else, it's a complete mess of sed and awk statements. After unix-by-default I moved on to Python, where I made a small batch of surprisingly powerful, but scraggly scripts. One script, for example, would check to see whether a residential parking permit ticket was valid by comparing a ticket's address against a set of streets and its ranges found on data.cityofchicago.org. The script ended up finding a lot of residential permit tickets given outside of the sign's listed boundaries. Sadly though, none of the findings were relevant, since a ticket is given based on the physical location of the sign, not what the sign's location data represents. These sorts or problems ended up putting a halt to these sorts of scripts. With the next development iteration, I switched to visualizing parking tickets, where I essentially continued until today. Problem was, the original data had no lat/lng and had to be geocoded.. So, back when I first started doing this, there weren't that many geocoding services that were reasonably priced. Google was certainly possible, but its licensing and usage limits made Google a total no-go. The other two options I found were from Equifax (expensive) and the US Post Office (also expensive). So I went down a several hundred hour rabbit hole of attempting to geocode all addresses myself - entirely under the philosophy that all tickets had a story to tell, and anything lower than 90% geocoding was inexecusable. The first attempt at geocoding was decent, but only matched about 30% of tickets. It worked by tokenizing addresses using a wonderful python library called usaddress, then comparing the "string distance" of a street address against a known set of address already paired with lat/long. It got me to a point where I could make my first map visualization using some simple plotting: Jupyter Notebook Still, with only 30% geocoded, I felt that I could do better through many different methods that continued to use string distance - some implementations much fancier than others. Other methods, ahem, used lots of sed and vim, which I'm still not proud of. There were some attempts where I got close to 90% of addresses geocoded, though I ended up throwing each one out in fear that the lack of validation would bite me in the future. Spoiler: it did. Eventually, my life was made much easier thanks to the folks at SmartyStreets, who set me up an account with unlimited geocoding requests. By itself, SmartyStreets was able to geocode around 50% of the addresses, but when combined with some of my string-distance autocorrection, that number shot up to 70%. It wasn't perfect, but it was "good enough" to move on, and didn't really require me to trust my own set corrections and the anxiety that comes with that. Another New Dataset Fast forward about six months - I hadn't gotten much further on visualization, but instead started focusing on my first blog post. A bit around that post, ProPublica published some excellent work on parking tickets and ended up releasing its own tickets datasets to the public, one of which was geocoded. I'm happy to say I had a small part in helping out with, and then started using it for my own work in the hopes of using a common dataset between groups. More Geocoding Fun A high level of ProPublica's geocoding process is described in pretty good detail here, so if you're interested in how the geocoding was done, you should check that out. In brief, the dataset's addresses were geocoded using Geocodio. The geocoding process they used was dramatically simplified by replacing each address's last two digits with zeroes. It worked surprisingly well, and brought the percent of geocoded tickets to 99%, but this method had the unfortunate side effect of reducing mapping accuracy. So, because I have more opinions than I care to admit about geocoding, I had to look into how well the geocoding was done. Geocoding Strangeness After a few checks against the geocoded results, I noticed that a large portion of of the addresses wasn't geocoded correctly - often in strange ways. One example comes from the tendency of geocoders to favor one direction over the other for some streets. As can be seen below, the geocoding done on Michigan and Wells is completely demolished. What’s particularly notable here is that Michigan Ave has zero instances of “S Michigan” swapped to “N Michigan”. N Wells S Wells Total Total Tickets (pre-geocoding) 280,160 127,064 407,224 Unique Addresses (pre-geocoding) 3,254 3,190 6,444 Total Tickets (post-geocoding) 342,020 37,611 379,631 Unique Addresses (post-geocoding) 40 57 97 Direction is swapped 246 54,309 54,555 N Michigan S Michigan Total Total Tickets (pre-geocoding) 102,225 392,391 494,616 Unique Addresses (pre-geocoding) 2,916 16,585 19,501 Total Tickets (post-geocoding) 15,760 509,485 512,401 Unique Addresses (post-geocoding) 136 144 280 Direction is swapped 102,225 0 102,225 Interestingly, the total ticket count for Wells drops by about 30k after being geocoding. This turned out to be from Geocodio strangely renaming ‘200 S Wells’ to ‘200 W Hill St’. Similar also happens with numbered addresses similar to “83rd St”, where a steet name is often renamed to "100th". 100th street's ticket count appears to have six times as many tickets as it actually does because of this. These are just a few examples, but it's probably safe to say it's tip of the iceberg. My gut tells me that the address simplification had a major effect here. I think a simple solution of replacing the last two digits with 01 if odd, and 02 if even - instead of 0s - would do the trick. Quick Geocoding Story A few years ago, I approached the (ex) Chicago Chief of Open Data and asked if he, or someone in Chicago, was interested in a copy of geocoded addresses I’d worked on, since they don’t actually have anything geocoded (heh). He politely declined, and mentioned that since Chicago has its own geocoder, my geocoded addresses weren’t useful. I then asked if Chicago could run the parking ticket addresses through their geocoder – something that would then be requestable through FOIA. His response was, “No, because that would require a for loop”. Hmmmmm. Visualization to Validate Geocoding In one of my stranger attempts to validate the geocoding results, I wanted to see whether it was possible to use the distance a ticketer traveled to check if a particular geocode was botched or not. It ended up being less useful I'd hoped, but out of it, I made some pretty interesting looking visualizations: Left: Ticketer paths every 15 minutes with 24 hours decay (Animated). Right: All ticketer paths for a full month. This validation code, while it wasn't even remotely useful, ended up being the foundation for what I'm calling my final personal attempt to visualize Chicago's parking tickets. Final Version And with that, here is the final version of the parking ticket visualization app. Click the image below to navigate to the app. Bokeh Frontend The frontend for the app uses a python framework named Bokeh. It's an extremely powerful interactive visualization framework that dramatically reduces how much code it takes to make both simple and complex interfaces. It's not without its (many) faults, but in python land, I personally consider it to be the best available. That said, I wish the Bokeh devs would invest a lot more time improving the serverside documentation. There's just way too much guesswork, buggy behavior, and special one-offs required to match Bokeh's non-server functionality. Flask/Pandas Backend The original backend was PostgreSQL along with PostGIS (for GeoJSON creation) and TimeScaleDB (for timeseries charting). Sadly, PostgreSQL started struggling with larger queries, combined with memory exhausting itself quicker than I'd hoped. In the end, I ended up writing a Flask service that runs in PyPy with Pandas as an in-memory data store. This ended up working surprisingly well in both speed and modularity. That said, there’s still a lot more room for speed improvement - especially in caching! If I ever revisit the backend, I'll try giving PostgreSQL another chance, but throw a bunch more optimization its way. Please feel free to check out the application code. Interesting Findings To wrap this post up, I wanted to share some of the things I've found while playing around with the app. Everything below comes from screenshots from the app, so pardon any loss of information. Snow Route Tickets Outside 3AM-7AM If you park your car in a "snow route" between 3AM-7AM, you will get a ticket. Interestingly, between the hours of 8am to 11pm, over 1,000 tickets have been given over the past 15 years - mostly by CPD. Of these, only 6% have been contested – 6 of which the car owner was still liable for some reason. The other 94% should have automatically been thrown out, but Chicago has no systematic way of doing this. Shame. Highest value line = CPD's tickets. Expired Meter Within Central Business District – Outside of CBD. Downtown Chicago has an area that's officially described as the “Central Business District”. Within this area, tickets for expired meters end up costing $65 instead of the normal $50. In the past 13 years, over 52k of these tickets have been given outside of the Central Business District. From those 52k, only 5k tickets were taken to court, and 70% were thrown out. Overall, Chicago made around $725k off the $15 difference. Another instance of where Chicago could have systematically thrown out many tickets, but didn't. Downtown removed to highlight non-central business district areas. City sticker expiration tickets.. in the middle of July? Up until 2015, Chicago's city stickers expired at the end of June. A massive spike of tickets is bound to happen, but what's strange is that the spike happens in the middle of July, not the start of it. At the onset of each spikes, Chicago is making somewhere between $500k and $1m. This is a $200 ticket that has a nonstop effect on Chicago's poorer citizens. For more on this, check out this great ProPublica article. (The two different colors here come from the fact that Chicago changed the ticket description sometime in 2012.) Bike Lane Tickets With the increasing prevalence of bike lanes and the danger from cars parked in them, I was expecting to see a lot of tickets for parking in a bike lane. Unfortunately, that’s not the case. For example, in 2017, only around 3,000 tickets for parking in a bike lane were given. For more on this, click here. Wards Have different Street Cleaning times I found some oddities while searching for street cleaning tickets given at odd hours. Wards 48, 49 and 50 seem to give out parking tickets much later than the rest of Chicago. Then, starting at 6am, Ward 40 starts giving out tickets earlier than the rest of Chicago. Posted signs probably make it relatively clear that there’s going to be street cleaning, but it’s also probably fair to say that this is understandably confusing. Hopefully there’s a good reason these wards decided to be special. What’s Next? That's hard to say. There's a ton of work that needs to be done to address the systemic problems with Chicago's parking, but there just aren't enough people working on the problem. If you're interested in helping out, you should definitely download the data and start playing around with it! Also, if you're in Chicago, you should also check out ChiHackNight on Tuesday nights. Just make sure to poke some of us parking ticket nerds beforehand in #tickets on CHiHackNight's slack. That said, a lot of these issues should really be looked at by Chicago in more depth. The system used to manage parking ticket information - CANVAS - has cost Chicago around $200,000,000, and yet it's very clear that nobody from Chicago is interested in diving into the data. In fact, based on an old FOIA request, Chicago has only done a single spreadsheet's worth of analysis with its parking ticket data. I hope that some day Chicago starts automating some ticket validation workflows, but it will probably take a lot of public advocacy to compel them to do so. Unrelated Notes Recently I filed suit against Chicago after a FOIA request for the columns and table names of CANVAS was was rejected with a network security exemption. If successful, I hope that the information can be used to submit future FOIA requests using the database schema as the foundation. I'm extremely happy with the argument I made, and I plan on writing about it as the case comes to a close. Anywho, if you like this post and want to see more like it, please consider donating to my non-profit, Free Our Info, NFP which recently became a 501(c)(3). Much of the FOIA work from this blog (much of which is unwritten) is being filtered over to the NFP, so donations will allow similar work to continue. A majority of the proceedings will go directly into paying for FOIA litigation. So, the more donations I receive, the more I can hold government agencies accountable! Hope you enjoyed. Tags: tickets, foia, chicago, bokeh, visualization
Background In my last post, I wrote about my adventure of requesting metadata for both phone calls and emails from the City of Chicago Office of the Mayor. The work there - and its associated frustration - sent me down a path of sending requests throughout the US to both learn whether these sorts of problems are systemic (megaspoiler: they are) and to also start mapping communication across the United States. Since then, I’ve submitted over a hundred requests for email metadata across the United States – at least two per state. The first large batch of requests for email metadata were sent to the largest cities of fourteen arbitrary states in a trial run of sorts. In the end of that batch, only two cities were willing to continue with the request - Houston and Seattle. Houston complied surprisingly quickly and snail mailed the metadata for 6m emails. Seattle on the other hand... The Request On April 2, 2017 I sent this fairly boilerplate request to Seattle's IT department: For all emails sent to/from any Seattle owned email address in 2017, please provide the following information: 1. From address 2. To address 3. bcc addresses 4. cc addresses 5. Time 6. Date Technically this request can done with a single line powershell command. At a policy level, though, it usually gets a lot of pushback. Seattle's first response included a bit of gobsmackery that I’ve almost become used to: Based on my preliminary research, there have been 5.5 million emails sent and 26.8 million emails received by seattle.gov email addresses in the past 90 days. This is a significant amount of records that will need to be reviewed prior to sharing them with you. Do you have a more targeted list of email addresses you might be interested in? If not, I will work to find out how long review will take and will be in touch. Of course, I still want the records, so - I would like to stick with this request as-is for all ~32m emails. Since this request is for metadata only, the amount of review needed should be relatively small. Fee Estimate of 33 Million Dollars A week later, I received this glorious response. Each paragraph is interesting in itself, so let’s break most of it down piece by piece. Rewording of request This acknowledges receipt of your public disclosure request C012032-041017 received on April 03, 2017 regarding: All emails sent to/from any Seattle owned email address in 2017 including metadata:1. From address2. To address3. bcc addresses4. cc addresses5. Time6. Date Notice the change of language from the original wording. Their rewording completely changes the scope of the request so that it's not just for metadata, but also the emails contents. No idea why they did that. Salary Fees With that being said, Seattle IT estimates spending 30 seconds to two minutes to review each email. It has been estimated that this work will take approximately 320 years of staff time at an expense of $33 million in salary. Wot. Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for! The estimate of 30s per metadata entry is also a bit suspect. Especially with the use of Excel, which would be useful for removing duplicates, etc. Storage Fees We estimate that this request contains 8-10 terabytes of information, for which we could need to stand up an FTP server through which the requester will be able to download cleared email meta data. As allowed under RCW 42.56.120, we would charge the requester for the actual copying costs of fulfilling this request. Based on the Seattle IT cost model used for internal City charge backs, the anticipated cost to the requester is $2,480 per year plus $2.11 per gigabyte of storage. We are still working on the storage requirements for this effort. If we assume 10 TB of storage, this would require $21,606.40/year in requester fees. Heh. Any sysadmin can tell you that The costs of storage doesn’t exactly come from the storage medium itself; administration costs, supporting hardware, etc, are the bulk of the costs. But come on, let’s be realistic here. There’s very little room for good faith in their cost estimates – especially since the last time a single gigabyte cost that much was between 2002 and 2004. That said – some other interesting things going on here: Their file size estimation is huge. For comparison, that Houston’s email metadata dump was only 1.2GB. The fact that they mention “meta data” [sic] implies that they did acknowledge that the request was for metadata. Seattle already uses Amazon S3 to store public records requests’ data. At the time, S3 was charging $.023/GB Continue anyway? At this time, the City anticipates that it will be able to provide a first installment of records on or about May 29, 2017. However, please note that this time estimate may change depending on the clarification you provide and as we continue to process your request. If the City does not hear from you within 30 days, the City will consider your request closed. Oddly, they don't actually close out the request and instead ask whether I wanted to continue or not. I responded to their amazing email by asking how many records I'd receive on May 29th, but never received an answer back. Reversal of the Original Cost Estimate On June 5, they sent a new response admitting that their initial fee estimation was wrong, and asked for $1.25 for two days’ (out of three months) of records: At this point I can send you an exel spreadsheet with the data points you are requesting. The cost for the first installment is $1.25 for emails sent or received on January 1 and 2, 2017. Because the spreadsheet does not contain the body of the email and just the metadata that you requested, no review will be necessary, and we'll be able to get this information to you at a faster pace than the 320 years quoted you earlier. Because they're asking for a single check for $1.25 for just two days’ worth of metadata – and wouldn’t send anything until that first check came in, my interpretation is that they’re taking a page from /r/maliciouscompliance and just making this request as painful as possible just for the simple sake of making it difficult. So in response, I preemptively sent them fourteen separate checks. The first thirteen checks were all around ~$1.25. That seemed to work, since they never asked for a single payment afterwards. From there, my inbox went mostly silent for two months, and I mostly forgot about the request, though they eventually cashed all of the checks and made me an account for their public records portal. SNAFU Fast forward to August 22, when I randomly added that email account back to my phone. Unexpectedly, it turned out they actually finished the request! And without a bill for millions of dollars! Sure enough, their public records request portal had about 400 files available to download, which all in all contained metadata for about 32 million emails. Neat! Problem though... they accidentally included the first 256 characters of all 32 million emails. Here are some things I found in the emails: Usernames and passwords. Credit card numbers. Social security numbers and drivers licenses. Ongoing police investigations and arrest reports. Texts of cheating husbands to their lovers. FBI Investigations. Zabbix alerts. In other words... they just leaked to me a massive dataset filled with intimately private information. In the process, they very likely broke many laws, including the Privacy Act of 1974 and many of WA's own public records laws. Frankly, I'm still at a loss of words. It’s hard to say how any of this happened exactly, but odds are that a combination of request’s rewording and the original public records officer going on vacation led to a communication breakdown. I don’t want to dwell on the mistake itself, so I’ll stop it at that. Side note to Seattle's IT department - clean up your disks. You shouldn't have that many disks at 100%! Raising the Issue I responded as passively as possible in the hopes that they’d catch their mistake on their own: The responsive records are not consistent with my request and includes much more info than I initially requested. Could you please revisit this request and provide the records responsive to my initial request? Their response: The information that you requestedis located in columns: From address = column J To address = column K bcc address = column M cc address = column L Time and date = column R of the reports. The records were generatedfrom a system report and I am unable to limit the report to generate only thefields you requested. The City has no duty to create a record that doesnot exist. As such, we have provided all records responsive to yourrequest and consider your request closed. Disregarding the fact that they used a very common tactic of denying information on the basis that its disclosure would require the creation of new records… they didn’t get the point. I explained what information they leaked, and made it very clear how I was going to escalate this: Please address this matter as if it was a large data breach. For now, I will be raising this matter to the WA Office of Privacy and Data Protection. None of the files provided to me have been shared with anyone else, nor do I have any future intention of sharing. Their response: Thank you for your email and bringing this inadvertent error to our attention so quickly. We have temporarily suspended access to GovQA while we look into the cause of this issue. We are also working on reprocessing your request and anticipate providing you with corrected copies of the records you requested through GovQA next week. In the meantime, please do not review, share, copy or otherwise use these records for any purpose. We are sorry for any inconvenience. Phone Call Not too long after that, after contacting some folks on Seattle’s Open Data Slack, I found my way onto a conference phone call with both Seattle’s Chief Technology officer and their Chief Privacy Officer and we discussed what happened, and what should happen with the records. They thanked me for bringing the situation to their attention and all that, but the mood of the call was as if both parties had a knife behind their back. Somewhere towards the end of the call, I asked them if it was okay to keep the emails. Why not at least ask, right? Funny enough, in the middle of that question, my internet died and interrupted the call for the first time in the six months I lived in that house. Odd. It came back ten minutes later, and I dialed back into the conference line, but the mood of the call pretty much 180’d. They told me: All files were to be deleted. Seattle would hire Kroll to scan my hard drives to prove deletion. Agreeing to #1 and #2 would give me full legal indemnification. This isn't something I'm even remotely cool with, so we ended the call a couple minutes later, and agreed to have our lawyers speak going forward. Deleting the Files After that call, I asked my lawyer to reach out to their lawyer and was pretty much told that Seattle was approaching the problem as if they were pursuing Computer Fraud And Abuse (CFAA) charges. For information that they sent. Jiminey Cricket.. So, I deleted the files. Most of what happened next over a month or so was mostly between their lawyer and mine, so there’s not really that much for me to say. Early on I suggested that I write an affidavit that explains what happened, how I deleted the files, and I validated that the files were deleted. They mostly agreed, but still wanted to throw some silly assurance things my way – including asking me to run a bash script to overwrite any unused disk space with random bits. I eventually ran zerofree and fstrim instead, and they accepted the affidavit. No more legal threats from there. Seattle’s Reaction About a week after the phone call, a Seattle city employee contacted Seattle’s KIRO7 about the incident. In KIRO7's investigation, they learned that, Seattle hadn’t sent any disclosure of the leak - something required by WA’s public records request law. Only after their investigation did Seattle actually notify its employees about the emails leak. Link to their story (video inside). A week later, another article was published by Seattle’s Crosscut which goes into a lot of detail, including some history of Seattle's IT department. This line towards the bottom still makes me laugh a little: The buffer against potential legal and administrative chaos in this scenario is only that Chapman has turned out to be, as Armbruster described him, a "good Samaritan." Efforts to track down Chapman were not successful; Crosscut contacted several Matthew Chapmans who denied being the requester. On January 19th, Seattle's CTO, Michael Mattmiller gave his resignation. Whether his resignation is related to the email leak is hard to say, but I just think the timing makes it worth mentioning. Finally – The Metadata Starting January 26th, Seattle started sending installments of the email metadata I requested. So far they've sent 27 million emails. As of the writing of this post, there are only two departments who haven’t provided their email metadata: the Police Department and Human Services. You can download the raw data here. Some things about the dataset: It’s very messy – triple quotes, semicolons, commas, oh my. There are a millions of systems alerts. For seattle.gov → seattle.gov communication, there are two distinct metadata records. In any case, it's still somewhat workable, so I've been working on a proof of concept for its use in the greater context of public records laws. Not ready to talk much about it yet, so here's is a gephi graph of one day's worth of metadata. Its layout is Yifan Hu and filtered with a k-core minimum of 5 and a minimum degree of 5: Please reach out to me if you'd like to help model these networks. One Last Thing: Legislative Immunity Kerfuffle This last section might not be related, but the timing is interesting, so I feel it’s worth mentioning. On February 23 - between the first installment of email metadata and the second - WA’s legislature attempted to pass SB6617, a bill which removes requirements for disclosure of many of their records – including email exchanges - from WA’s public records laws. What’s particularly interesting about this events of this bill is that it took less than 24 hours from the time it was read for the first time to the time that it passed at both the House and Senate and sent to the Governor’s office. Seattle Times wrote a good article about it. Thankfully, after the WA governor’s office received over 6,300 phone calls, 100 letters, and over 12,500 emails, the governor ended up vetoing the bill. Neat. It's hard to say if that caused any sort of delay, but after a month and a half of waiting: How are the installments looking? I saw that there was some recent legislative immunity kerfuffle around emails. Is that related to any delays? And got this response: Good news. The recent Washington state legislative immunity kerfuffle will not impact your installments. We have fixed the bug that was impacting our progress and are now on our way. In fact, I'll have more records for you this week. A month later, they started sending the rest. What’s Next? The work done throughout this post has led to a massive trove of information that ought to be enormously useful in understanding the dynamics of one the US's biggest cities. A big hope in making this sort of information available to the public is that it will help in changing the dynamic of understanding what sorts of information is accessible. That said, this is just one city of many which have given me email metadata. As more of it comes through, I’ll be able to map out more and more, but the difficulty in requesting those records continues to get in the way. Once I get some of these bigger stories out of the way, I’ll start writing fewer stories and write more about public records requesting fundamentals – particularly for digital records. Next post will be about my ongoing suit against the White House OMB for email metadata from January 2017. This past Wednesday was the first court date - where the defendent's counsel never showed up. Hope you enjoyed! Tags: seattle, foia, kerfuffle, metadata
Intro Back in 2014, I had the naive goal of finding evidence of collusion between mayoral candidates. The reasoning is longwinded and boring, so I won't go into it. My plan was to find some sort of evidence through a FOIA request or two for the mayor's phone records, find zero evidence of collusion, then move onto a different project like I normally do. What came instead was a painful year and a half struggle to get a single week's worth of phone records from Chicago's Office of the Mayor. Hope you enjoy and learn something along the way! Requests to Mayor's Office My first request was simple and assumed that the mayor's office had a modern phone. On Dec 8, 2014, I sent this anonymous FOIA request to Chicago's Office of the Mayor: Please attach all of the mayor's phone records from any city-owned phones (including cellular phones) over the past 4 years. Ten days later, I received a rejection back stating they didn't have any of the mayor's phone records: A FOIA request must be directed to the department that maintains the records you are seeking. The Mayor’s Office does not have any documents responsive to your request. Then, to test testing whether it was just the mayor whose records their office didn't maintain, I sent another request to the Office of the Mayor - this time specifically for the FOIA officer's phone records and got the same response. Maybe another department has the records? VoIP Logs Request An outstanding question I had (and still have, to some extent) is whether or not server logs are accessible through FOIA. So, to kill two birds with one stone, I sent a request for VoIP server logs to Chicago's Department of Innovation and Technology (DoIT): Please attach in a standard text, compressed format, all VoIP server logs that would contain phone numbers dialed between the dates of 11/24/14 and 12/04/14 for [the mayor's phone]. Ten days later (and five days late), I received a response that my request was being reviewed. Because they were late to respond, IL FOIA says that they can no longer reject my request if it's unduly burdensome - one of the more interesting statutory pieces of IL FOIA. The phone... records? A month goes by, and they send back a two page PDF with phone numbers whose last four digits are redacted: ...along with a two and a half page letter explaining why. I really encourage you to read it. tl;dr of their response and its records: They claim that the review/redaction process would be extremely unduly burdensome - even though they were 5 days late! The pdf includes 83 separate phone calls, with 45 unique phone numbers. The last four digit of each phone number is removed. Government issued cell phones’ numbers have been removed completely for privacy reasons. Private phone numbers aren’t being redacted to the same extent as government cell phones. Government desk phones are redacted. Their response is particularly strange, because IL FOIA says: "disclosure of information that bears on the public duties of public employees and officials shall not be considered an invasion of personal privacy." With the help of my lawyer, I sent an email to Chicago explaining this... and never received a response. Time to appeal! Request for Review In many IL FOIA rejections, the following text is written at the bottom: You have a right of review of this denial by the IL Attorney General's Public Access Counselor, who can be contacted at 500 S. Second St., Springfield, IL 62706 or by telephone at (217)(558)-0486. You may also seek judicial review of a denial under 5 ILCS 140/11 of FOIA. I went the first route by submitting a Request for Review (RFR). The RFR letter can be boiled down to: They stopped responding. Redaction favors the government personnel’s privacy over individuals’, despite FOIA statute. Their response to the original request took ten days. Seven Months Later Turns out RFRs are very, very slow. So - seven months later, I received a RFR closing letter with a non-binding opinion saying that Chicago should send the records I requested. Their reason mostly boils down to Chicago not giving sufficient reason to call the request unduly burdensome. A long month later - August 11, 2015: - Chicago responds with this, saying that my original request was for VoIP server logs, which Chicago doesn’t have: [H]e requested "VoIP server logs," which the Department has established it does not possess. As a result, the City respectfully disagrees with your direction to produce records showing telephone numbers, as there is not an outstanding FOIA request for responsive records in the possession of the Department. Sure enough, his phone is pretty ancient: Image Source Do Over And so after nine months of what felt like wasted effort, I submitted another request that same day: Please attach [...] phone numbers dialed between the dates of 11/24/14 and 12/04/14 for [the Office of the Mayor] Two weeks later - this time with an on-time extension letter - I'm sent another file that looks like this: The exact same file. They even sent the same rejection reasons! Lawsuit On 12/2/2015, Loevy & Loevy filed suit against Chicago’s DoIT. The summary of the complaint is that we disagree with their claim that to review and redact the phone records would be extremely unduly burdensome. My part in this was waiting while my lawyer did all the work. I wasn't really involved in this part, so there's really not much for me to write about. Lawsuit conclusion Six months later, on May 11, 2016, the city settled and gave me four pages of phone logs - most of which were still redacted. Some battles, eh? Interesting bits from the court document: ...DoIT and its counsel became aware that, in its August 24, 2015 response, DoIT had inadvertently misidentified the universe of responsive numbers. DoIT identified approximately 130 additional phone numbers dialed from the phones dialed within Suite 507 of City Hall, bringing the total to 171 ...FOIA only compels the production of listed numbers belonging to businesses, governmental agencies and other entities, and only those numbers which are not work-issued cell phones. ...DoIT asserted that compliance with plaintiff request was overly burdensome pursuant to Section 3(g) of FOIA. On those grounds —rather than provide no numbers at all — DoIT redacted the last four digits of all phone numbers provided ...in other words, they "googled" each number to determine whether that number was publically listed, and, if so, to whom it belonged. This resulted in the identification of 57 out of the 137 responsive numbers... Lawsuit Records All in all, the phone records contained: 171 unique phone log entries: 57 unredacted and 114 redacted. 32 unique unredacted phone numbers. 44 unique redacted phone numbers. From there, there really wasn't much to work with. Most of the phone calls were day-to-day calls to places like flower shops, doctors and restaurants. Still, some numbers are interesting: A four-hour hotel: Prestige Club: Aura Investigative services: Statewide Investigative Services and Kennealy & O'Callaghanh Michael Madigan Data: Lawsuit Records Going deeper With all of that done – a year and a half in total for one request - I wasn’t feeling satisfied and dug deeper. This time, I started approaching it methodically to build a toolchain of sorts. So, to determine whether the same length of time could be requested without another lawsuit: Please provide me with the to/from telephone numbers, duration, time and date of all calls dialed from 121 N La Salle St #507, Chicago, IL 60602 for the below dates. April 6-9, 2015 November 23-25, 2015 And sure enough, two weeks later, I received two pdfs with phone records – this time with times, dates, from number and call length! Much faster now! Still, it’s lame that they’re still redacting a lot, and there wasn't anything interesting in these records. Data: Long Distance, Local Full year of records How about for a full year for a small set of previously released phone numbers? Please provide to me, for the year of 2014, the datetime and dialed-from number for the below numbers from the [Office of the Mayor] (312) 942-1222 [Statewide Investigative Services] (505) 747-4912 [Azura Investigations] (708) 272-6000 [Aura - Prestige Club] (773) 783-4855 [Kennealy & O'Callaghanh] (312) 606-9999 [Siam Rice] (312) 553-5698 [Some guy named Norman Kwong] Again, success! Siam Rice: 55 calls! Statewide Investigative Services: 8 calls Keannealy & O'Callaghanh: 10 calls Some guy named Norman Kwong: 3 calls (Interestingly enough, they didn't give me the prestige club phone numbers. Heh!) Data: Full year records Full year of records - City hall And finally, another request for previously released numbers – across all of city hall in 2014 and 2015: The phone numbers, names and call times to and from the phone numbers listed below during 2014 and 2015 [within city hall] (312) 942-1222 [Statewide Investigative Services] (773) 783-4855 [Kennealy & O'Callaghanh] (312) 346-4321 [Madigan & Getzendanner] (773) 581-8000 [Michael Madigan] (708) 272-6000 [Aura - Prestige Club] Data: City hall full year This means that a few methods of retrieving phone records are possible: A week's worth of (mostly redacted) records from a high profile office. A phone records through an office as big as the City Hall. The use of requesting unredacted phone numbers for future requests. Phone directory woes Of the 27 distinct Chicago phone numbers found within the last request’s records, only five of them could be resolved to a phone number found in Chicago’s phone directory: DEAL, AARON J KLINZMAN, GRANT T EMANUEL, RAHM NELSON, ASHLI RENEE MAGANA, JASMINE M This is a problem that I haven't solved for yet, but it should be easy enough by requesting a full phone directory from Chicago's DoIT. Anyone up for that challenge? ;) Emails? This probably deserves its own blog post, but I wanted tease it a bit, because it leads into other posts. I sent this request with the presumption that the redaction of emails would take a very long time: From all emails sent from [the Office of the Mayor] between 11/24/14 and 12/04/14, please provide me to all domain names for all email addresses in the to/cc/bcc. From each email, include the sender's address and sent times. Two months later, I received a 1,751 page document with full email addresses for to, from cc, and bcc, including the times of 18,860 separate emails to and from the mayor’s office. Neat - it only took about a year and a half to figure out how to parse the damn thing, though.... Interestingly, the mayor's email address isn't in there that often... Data: Email Metadata What's next? This whole process was a complete and total pain. The usefulness of knowing the ongoings of our government - especially at its highest levels - are critical for ensuring that our government is open and honest. It really shouldn't have been this difficult, but it was. The difficulties led me down an interesting path of doing many similar requests - and boy are there stories. Next post: The time Seattle accidentally sent me 30m emails for ~$30. Code Scraping Chicago's phone directory Parsing 1,700 page email pdf Tags: foia, phone, email, data, chicago
More in programming
What does it mean when someone writes that a programming language is “strongly typed”? I’ve known for many years that “strongly typed” is a poorly-defined term. Recently I was prompted on Lobsters to explain why it’s hard to understand what someone means when they use the phrase. I came up with more than five meanings! how strong? The various meanings of “strongly typed” are not clearly yes-or-no. Some developers like to argue that these kinds of integrity checks must be completely perfect or else they are entirely worthless. Charitably (it took me a while to think of a polite way to phrase this), that betrays a lack of engineering maturity. Software engineers, like any engineers, have to create working systems from imperfect materials. To do so, we must understand what guarantees we can rely on, where our mistakes can be caught early, where we need to establish processes to catch mistakes, how we can control the consequences of our mistakes, and how to remediate when somethng breaks because of a mistake that wasn’t caught. strong how? So, what are the ways that a programming language can be strongly or weakly typed? In what ways are real programming languages “mid”? Statically typed as opposed to dynamically typed? Many languages have a mixture of the two, such as run time polymorphism in OO languages (e.g. Java), or gradual type systems for dynamic languages (e.g. TypeScript). Sound static type system? It’s common for static type systems to be deliberately unsound, such as covariant subtyping in arrays or functions (Java, again). Gradual type systems migh have gaping holes for usability reasons (TypeScript, again). And some type systems might be unsound due to bugs. (There are a few of these in Rust.) Unsoundness isn’t a disaster, if a programmer won’t cause it without being aware of the risk. For example: in Lean you can write “sorry” as a kind of “to do” annotation that deliberately breaks soundness; and Idris 2 has type-in-type so it accepts Girard’s paradox. Type safe at run time? Most languages have facilities for deliberately bypassing type safety, with an “unsafe” library module or “unsafe” language features, or things that are harder to spot. It can be more or less difficult to break type safety in ways that the programmer or language designer did not intend. JavaScript and Lua are very safe, treating type safety failures as security vulnerabilities. Java and Rust have controlled unsafety. In C everything is unsafe. Fewer weird implicit coercions? There isn’t a total order here: for instance, C has implicit bool/int coercions, Rust does not; Rust has implicit deref, C does not. There’s a huge range in how much coercions are a convenience or a source of bugs. For example, the PHP and JavaScript == operators are made entirely of WAT, but at least you can use === instead. How fancy is the type system? To what degree can you model properties of your program as types? Is it convenient to parse, not validate? Is the Curry-Howard correspondance something you can put into practice? Or is it only capable of describing the physical layout of data? There are probably other meanings, e.g. I have seen “strongly typed” used to mean that runtime representations are abstract (you can’t see the underlying bytes); or in the past it sometimes meant a language with a heavy type annotation burden (as a mischaracterization of static type checking). how to type So, when you write (with your keyboard) the phrase “strongly typed”, delete it, and come up with a more precise description of what you really mean. The desiderata above are partly overlapping, sometimes partly orthogonal. Some of them you might care about, some of them not. But please try to communicate where you draw the line and how fuzzy your line is.
(Last week's newsletter took too long and I'm way behind on Logic for Programmers revisions so short one this time.1) In classical logic, two operators F/G are duals if F(x) = !G(!x). Three examples: x || y is the same as !(!x && !y). <>P ("P is possibly true") is the same as ![]!P ("not P isn't definitely true"). some x in set: P(x) is the same as !(all x in set: !P(x)). (1) is just a version of De Morgan's Law, which we regularly use to simplify boolean expressions. (2) is important in modal logic but has niche applications in software engineering, mostly in how it powers various formal methods.2 The real interesting one is (3), the "quantifier duals". We use lots of software tools to either find a value satisfying P or check that all values satisfy P. And by duality, any tool that does one can do the other, by seeing if it fails to find/check !P. Some examples in the wild: Z3 is used to solve mathematical constraints, like "find x, where f(x) >= 0. If I want to prove a property like "f is always positive", I ask z3 to solve "find x, where !(f(x) >= 0), and see if that is unsatisfiable. This use case powers a LOT of theorem provers and formal verification tooling. Property testing checks that all inputs to a code block satisfy a property. I've used it to generate complex inputs with certain properties by checking that all inputs don't satisfy the property and reading out the test failure. Model checkers check that all behaviors of a specification satisfy a property, so we can find a behavior that reaches a goal state G by checking that all states are !G. Here's TLA+ solving a puzzle this way.3 Planners find behaviors that reach a goal state, so we can check if all behaviors satisfy a property P by asking it to reach goal state !P. The problem "find the shortest traveling salesman route" can be broken into some route: distance(route) = n and all route: !(distance(route) < n). Then a route finder can find the first, and then convert the second into a some and fail to find it, proving n is optimal. Even cooler to me is when a tool does both finding and checking, but gives them different "meanings". In SQL, some x: P(x) is true if we can query for P(x) and get a nonempty response, while all x: P(x) is true if all records satisfy the P(x) constraint. Most SQL databases allow for complex queries but not complex constraints! You got UNIQUE, NOT NULL, REFERENCES, which are fixed predicates, and CHECK, which is one-record only.4 Oh, and you got database triggers, which can run arbitrary queries and throw exceptions. So if you really need to enforce a complex constraint P(x, y, z), you put in a database trigger that queries some x, y, z: !P(x, y, z) and throws an exception if it finds any results. That all works because of quantifier duality! See here for an example of this in practice. Duals more broadly "Dual" doesn't have a strict meaning in math, it's more of a vibe thing where all of the "duals" are kinda similar in meaning but don't strictly follow all of the same rules. Usually things X and Y are duals if there is some transform F where X = F(Y) and Y = F(X), but not always. Maybe the category theorists have a formal definition that covers all of the different uses. Usually duals switch properties of things, too: an example showing some x: P(x) becomes a counterexample of all x: !P(x). Under this definition, I think the dual of a list l could be reverse(l). The first element of l becomes the last element of reverse(l), the last becomes the first, etc. A more interesting case is the dual of a K -> set(V) map is the V -> set(K) map. IE the dual of lived_in_city = {alice: {paris}, bob: {detroit}, charlie: {detroit, paris}} is city_lived_in_by = {paris: {alice, charlie}, detroit: {bob, charlie}}. This preserves the property that x in map[y] <=> y in dual[x]. And after writing this I just realized this is partial retread of a newsletter I wrote a couple months ago. But only a partial retread! ↩ Specifically "linear temporal logics" are modal logics, so "eventually P ("P is true in at least one state of each behavior") is the same as saying !always !P ("not P isn't true in all states of all behaviors"). This is the basis of liveness checking. ↩ I don't know for sure, but my best guess is that Antithesis does something similar when their fuzzer beats videogames. They're doing fuzzing, not model checking, but they have the same purpose check that complex state spaces don't have bugs. Making the bug "we can't reach the end screen" can make a fuzzer output a complete end-to-end run of the game. Obvs a lot more complicated than that but that's the general idea at least. ↩ For CHECK to constraint multiple records you would need to use a subquery. Core SQL does not support subqueries in check. It is an optional database "feature outside of core SQL" (F671), which Postgres does not support. ↩
Omarchy 2.0 was released on Linux's 34th birthday as a gift to perhaps the greatest open-source project the world has ever known. Not only does Linux run 95% of all servers on the web, billions of devices as an embedded OS, but it also turns out to be an incredible desktop environment! It's crazy that it took me more than thirty years to realize this, but while I spent time in Apple's walled garden, the free software alternative simply grew better, stronger, and faster. The Linux of 2025 is not the Linux of the 90s or the 00s or even the 10s. It's shockingly more polished, capable, and beautiful. It's been an absolute honor to celebrate Linux with the making of Omarchy, the new Linux distribution that I've spent the last few months building on top of Arch and Hyprland. What began as a post-install script has turned into a full-blown ISO, dedicated package repository, and flourishing community of thousands of enthusiasts all collaborating on making it better. It's been improving rapidly with over twenty releases since the premiere in late June, but this Version 2.0 update is the biggest one yet. If you've been curious about giving Linux a try, you're not afraid of an operating system that asks you to level up and learn a little, and you want to see what a totally different computing experience can look and feel like, I invite you to give it a go. Here's a full tour of Omarchy 2.0.
In 2020, Apple released the M1 with a custom GPU. We got to work reverse-engineering the hardware and porting Linux. Today, you can run Linux on a range of M1 and M2 Macs, with almost all hardware working: wireless, audio, and full graphics acceleration. Our story begins in December 2020, when Hector Martin kicked off Asahi Linux. I was working for Collabora working on Panfrost, the open source Mesa3D driver for Arm Mali GPUs. Hector put out a public call for guidance from upstream open source maintainers, and I bit. I just intended to give some quick pointers. Instead, I bought myself a Christmas present and got to work. In between my university coursework and Collabora work, I poked at the shader instruction set. One thing led to another. Within a few weeks, I drew a triangle. In 3D graphics, once you can draw a triangle, you can do anything. Pretty soon, I started work on a shader compiler. After my final exams that semester, I took a few days off from Collabora to bring up an OpenGL driver capable of spinning gears with my new compiler. Over the next year, I kept reverse-engineering and improving the driver until it could run 3D games on macOS. Meanwhile, Asahi Lina wrote a kernel driver for the Apple GPU. My userspace OpenGL driver ran on macOS, leaving her kernel driver as the missing piece for an open source graphics stack. In December 2022, we shipped graphics acceleration in Asahi Linux. In January 2023, I started my final semester in my Computer Science program at the University of Toronto. For years I juggled my courses with my part-time job and my hobby driver. I faced the same question as my peers: what will I do after graduation? Maybe Panfrost? I started reverse-engineering of the Mali Midgard GPU back in 2017, when I was still in high school. That led to an internship at Collabora in 2019 once I graduated, turning into my job throughout four years of university. During that time, Panfrost grew from a kid’s pet project based on blackbox reverse-engineering, to a professional driver engineered by a team with Arm’s backing and hardware documentation. I did what I set out to do, and the project succeeded beyond my dreams. It was time to move on. What did I want to do next? Finish what I started with the M1. Ship a great driver. Bring full, conformant OpenGL drivers to the M1. Apple’s drivers are not conformant, but we should strive for the industry standard. Bring full, conformant Vulkan to Apple platforms, disproving the myth that Vulkan isn’t suitable for Apple hardware. Bring Proton gaming to Asahi Linux. Thanks to Valve’s work for the Steam Deck, Windows games can run better on Linux than even on Windows. Why not reap those benefits on the M1? Panfrost was my challenge until we “won”. My next challenge? Gaming on Linux on M1. Once I finished my coursework, I started full-time on gaming on Linux. Within a month, we shipped OpenGL 3.1 on Asahi Linux. A few weeks later, we passed official conformance for OpenGL ES 3.1. That put us at feature parity with Panfrost. I wanted to go further. OpenGL (ES) 3.2 requires geometry shaders, a legacy feature not supported by either Arm or Apple hardware. The proprietary OpenGL drivers emulate geometry shaders with compute, but there was no open source prior art to borrow. Even though multiple Mesa drivers need geometry/tessellation emulation, nobody did the work to get there. My early progress on OpenGL was fast thanks to the mature common code in Mesa. It was time to pay it forward. Over the rest of the year, I implemented geometry/tessellation shader emulation. And also the rest of the owl. In January 2024, I passed conformance for the full OpenGL 4.6 specification, finishing up OpenGL. Vulkan wasn’t too bad, either. I polished the OpenGL driver for a few months, but once I started typing a Vulkan driver, I passed 1.3 conformance in a few weeks. What remained was wiring up the geometry/tessellation emulation to my shiny new Vulkan driver, since those are required for Direct3D. Et voilà, Proton games. Along the way, Karol Herbst passed OpenCL 3.0 conformance on the M1, running my compiler atop his “rusticl” frontend. Meanwhile, when the Vulkan 1.4 specification was published, we were ready and shipped a conformant implementation on the same day. After that, I implemented sparse texture support, unlocking Direct3D 12 via Proton. …Now what? Ship a great driver? Check. Conformant OpenGL 4.6, OpenGL ES 3.2, and OpenCL 3.0? Check. Conformant Vulkan 1.4? Check. Proton gaming? Check. That’s a wrap. We’ve succeeded beyond my dreams. The challenges I chased, I have tackled. The drivers are fully upstream in Mesa. Performance isn’t too bad. With the Vulkan on Apple myth busted, conformant Vulkan is now coming to macOS via LunarG’s KosmicKrisp project building on my work. Satisfied, I am now stepping away from the Apple ecosystem. My friends in the Asahi Linux orbit will carry the torch from here. As for me? Onto the next challenge!
TokyoDev has published a number of different guides on coming to Japan to work as a software developer. But what if you’re already employed in another industry in Japan, and are considering changing your career to software development? I interviewed four people who became developers after they moved to Japan, for their advice and personal experiences on: Why they chose development How they switched careers How they successfully found their first jobs What mistakes they made in the job hunt The most important advice they give to others Why switch to software development? A lifelong goal For Yuta Asakura, a career in software was the dream all along. “I’ve always wanted to work with computers,” he said, “but due to financial difficulties, I couldn’t pursue a degree in computer science. I had to start working early to support my single mother. As the eldest child, I focused on helping my younger brother complete his education.” To support his family, Asakura worked in construction for eight years, eventually becoming a foreman in Yokohama. Meanwhile, his brother graduated, and became a software engineer after joining the Le Wagon Tokyo bootcamp. About a year before his brother graduated, Asakura began to delve back into development. “I had already begun self-studying in my free time by taking online courses and building small projects,” he explained. “ I quickly became hooked by how fun and empowering it was to learn, apply, and build. It wasn’t always easy. There were moments I wanted to give up, but the more I learned, the more interesting things I could create. That feeling kept me going.” What truly inspired me was the idea of creating something from nothing. Coming from a construction background, I was used to building things physically. But I wanted to create things that were digital, scalable, borderless, and meaningful to others. An unexpected passion As Andrew Wilson put it, “Wee little Andrew had a very digital childhood,” full of games and computer time. Rather than pursuing tech, however, he majored in Japanese and moved to Japan in 2012, where he initially worked as a language teacher and recruiter before settling into sales. Wilson soon discovered that sales wasn’t really his strong suit. “At the time I was selling three different enterprise software solutions.” So I had to have a fairly deep understanding of that software from a user perspective, and in the course of learning about these products and giving technical demonstrations, I realized that I liked doing that bit of my job way more than I liked actually trying to sell these things. Around that time, he also realized he didn’t want to manually digitize the many business cards he always collected during sales meetings: “That’s boring, and I’m lazy.” So instead, he found a business card-scanning app, made a spreadsheet to contain the data, automated the whole process, and shared it internally within his company. His manager approached him soon afterwards, saying, “You built this? We were looking to hire someone to do this!” Encouraged, Wilson continued to develop it. “As soon as I was done with work,” he explained with a laugh, “I was like, ‘Oh boy, I can work on my spreadsheet!’” As a result, Wilson came to the conclusion that he really should switch careers and pursue his passion for programming. Similarly to Wilson, Malcolm Hendricks initially focused on Japanese. He came to Japan as an exchange student in 2002, and traveled to Japan several more times before finally relocating in 2011. Though his original role was as a language teacher, he soon found a job at a Japanese publishing company, where he worked as an editor and writer for seven years. However, he felt burned out on the work, and also that he was in danger of stagnating; since he isn’t Japanese, the road to promotion was a difficult one. He started following some YouTube tutorials on web development, and eventually began creating websites for his friends. Along the way, he fell in love with development, on both a practical and a philosophical level. “There’s another saying I’ve heard here and there—I don’t know exactly who to attribute it to—but the essence of it goes that ‘Computer science is just teaching rocks how to think,’” Hendricks said. “My mentor Bob has been guiding me through the very fundamentals of computer science, down to binary calculations, Boolean logic, gate theory, and von Neumann architecture. He explains the fine minutia and often concludes with, ‘That’s how it works. There’s no magic to it.’ “Meanwhile, in the back of my mind, I can’t help but be mystified at the things we are all now able to do, such as having video calls from completely different parts of the world, or even me here typing on squares of plastic to make letters appear on a screen that has its own source of light inside it. . . . [It] sounds like the highest of high-fantasy wizardry to me.” I’ve always had a love for technomancy, but I never figured I might one day get the chance to be a technomancer myself. And I love it! We have the ability to create nigh unto anything in the digital world. A practical solution When Paulo D’Alberti moved to Japan in 2019, he only spoke a little Japanese, which limited his employment prospects. With his prior business experience, he landed an online marketing role for a blockchain startup, but eventually exited the company to pursue a more stable work environment. “But when I decided to leave the company,” D’Alberti said, “my Japanese was still not good enough to do business. So I was at a crossroads.” Do I decide to join a full-time Japanese language course, aiming to get JLPT N2 or the equivalent, and find a job on the business side? . . . Or do I say screw it and go for a complete career change and get skills in something more technical, that would allow me to carry those skills [with me] even if I were to move again to another country?” The portability of a career in development was a major plus for D’Alberti. “That was one of the big reasons. Another consideration was that, looking at the boot camps that were available, the promise was ‘Yeah, we’ll teach you to be a software developer in nine weeks or two months.’ That was a much shorter lead time than getting from JLPT N4 to N2. I definitely wouldn’t be able to do that in two months.” Since D’Alberti had family obligations, the timeline for his career switch was crucial. “We still had family costs and rent and groceries and all of that. I needed to find a job as soon as possible. I actually already at that point had been unsuccessfully job hunting for two months. So that was like, ‘Okay, the savings are winding up, and we are running out of options. I need to make a decision and make it fast.’” How to switch careers Method 1: Software Development Bootcamp Under pressure to find new employment quickly, D’Alberti decided to enter the Le Wagon Coding Bootcamp in Tokyo. Originally, he wavered between Le Wagon and Code Chrysalis, which has since ended its bootcamp programs. “I went with Le Wagon for two reasons,” he explained. “There were some scheduling reasons. . . . But the main reason was that Code Chrysalis required you to pass a coding exam before being admitted to their bootcamp.” Since D’Alberti was struggling to learn development by himself, he knew his chances of passing any coding exam were slim. “I tried Code Academy, I tried Solo Learn, I tried a whole bunch of apps online, I would follow the examples, the exercises . . . nothing clicked. I wouldn’t understand what I was doing or why I was doing it.” At the time, Le Wagon only offered full-time web development courses, although they now also have part-time courses and a data science curriculum. Since D’Alberti was unemployed, a full-time program wasn’t a problem for him, “But it did mean that the people who were present were very particular [kinds] of people: students who could take some time off to add this to their [coursework], or foreigners who took three months off and were traveling and decide to come here and do studying plus sightseeing, and I think there were one or two who actually asked for time off from the job in order to participate.” It was a very intense course, and the experience itself gave me exactly what I needed. I had been trying to learn by myself. It did not work. I did not understand. [After joining], the first day or second day, suddenly everything clicked. D’Alberti appreciated how Le Wagon organized the curriculum to build continuously off previous lessons. By the time he graduated in June of 2019, he’d built three applications from scratch, and felt far more confident in his coding abilities. “It was great. [The curriculum] was amazing, and I really felt super confident in my abilities after the three months. Which, looking back,” he joked, “I still had a lot to learn.” D’Alberti did have some specific advice for those considering a bootcamp: “Especially in the last couple of weeks, it can get very dramatic. You are divided into teams and as a team, you’re supposed to develop an application that you will be demonstrating in front of other people.” Some of the students, D’Alberti explained, felt that pressure intensely; one of his classmates broke down in tears. “Of course,” he added, “one of the big difficulties of joining a bootcamp is economical. The bootcamp itself is quite expensive.” While between 700,000 and 800,000 yen when D’Alberti went through the bootcamp, Le Wagon’s tuition has now risen to 890,000 yen for Web Development and 950,000 for Data Science. At the time D’Alberti joined there was no financial assistance. Now, Le Wagon has an agreement with Hello Work, so that students who are enrolled in the Hello Work system can be reimbursed for up to 70 percent of the bootcamp’s tuition. Though already studying development by himself, Asakura also enrolled in Le Wagon Tokyo in 2024, “to gain structure and accountability,” he said. One lesson that really stayed with me came from Sylvain Pierre, our bootcamp director. He said, ‘You stop being a developer the moment you stop learning or coding.’ That mindset helped me stay on track. Method 2: Online computer science degree Wilson considered going the bootcamp route, but decided against it. He knew, from his experience in recruiting, that a degree would give him an edge—especially in Japan, where having the right degree can make a difference in visa eligibility “The quality of bootcamps is perfectly fine,” he explained. “If you go through a bootcamp and study hard, you can get a job and become a developer no problem. I wanted to differentiate myself on paper as much as I could . . . [because] there are a lot of smart, motivated people who go through a bootcamp.” Whether it’s true or not, whether it’s valid or not, if you take two candidates who are very similar on paper, and one has a coding bootcamp and one has a degree, from a typical Japanese HR perspective, they’re going to lean toward the person with the degree. “Whether that’s good or not, that’s sort of a separate situation,” Wilson added. “But the reality [is] I’m older and I’m trying to make a career change, so I want to make sure that I’m giving myself every advantage that I can.” For these reasons, Wilson opted to get his computer science degree online. “There’s a program out of the University of Oregon, for people who already had a Bachelor’s degree in a different subject to get a Bachelor’s degree in Computer Science. “Because it’s limited to people who already have a Bachelor’s degree, that means you don’t need to take any non-computer science classes. You don’t need any electives or prerequisites or anything like that.” As it happened, Wilson was on paternity leave when he started studying for his degree. “That was one of my motivations to finish quickly!” he said. In the end, with his employer’s cooperation, he extended his paternity leave to two years, and finished the degree in five quarters. Method 3: Self-taught Hendricks took a different route, combining online learning materials with direct experience. He primarily used YouTube tutorials, like this project from one of his favorite channels, to teach himself. Once he had the basics down, he started creating websites for friends, as well as for the publishing company he worked for at the time. With every site, he’d put his name at the bottom of the page, as a form of marketing. This worked well enough that Hendricks was able to quit his work at the translation company and transition to full-time freelancing. However, eventually the freelancing work dried up, and he decided he wanted to experience working at a tech company—and not just for job security reasons. Hendricks saw finding a full-time development role as the perfect opportunity to push himself and see just how far he could get in his new career. There’s a common trope, probably belonging more to the sports world at large, about the importance of shedding ‘blood, sweat, and tears’ in the pursuit of one’s passion . . . and that’s also how I wanted to cut my teeth in the software engineering world. The job hunt While all four are now successfully employed as developers, Asakura, D’Alberti, Wilson, and Hendricks approached and experienced the job hunt differently. Following is their hard-earned advice on best practices and common mistakes. DO network When Hendricks started his job hunt, he faced the disadvantages of not having any formal experience, and also being both physically and socially isolated from other developers. Since he and his family were living in Nagano, he wasn’t able to participate in most of the tech events and meet-ups available in Tokyo or other big cities. His initial job hunt took around a year, and at one point he was sending so many applications that he received a hundred rejections in a week. It wasn’t until he started connecting with the community that he was able to turn it around, eventually getting three good job offers in a single week. Networking, for me, is what made all the difference. It was through networking that I found my mentors, found community, and joined and even started a few great Discord servers. These all undeniably contributed to me ultimately landing my current job, but they also made me feel welcome in the industry. Hendricks particularly credits his mentors, Ean More and Bob Cousins, for giving him great advice. “My initial mentor [Ean More] I actually met through a mutual IT networking Facebook group. I noticed that he was one of the more active members, and that he was always ready to lend a hand to help others with their questions and spread a deeper understanding of programming and computer science. He also often posted snippets of his own code to share with the community and receive feedback, and I was interested in a lot of what he was posting. “I reached out to him and told him I thought it was amazing how selfless he was in the group, and that, while I’m still a junior, if there was ever any grunt work I could do under his guidance, I would be happy to do so. Since he had a history of mentoring others, he offered to do so for me, and we’ve been mentor/mentee and friends ever since.” “My other mentor [Bob Cousins],” Hendricks continued, “was a friend of my late uncle’s. My uncle had originally begun mentoring me shortly before his passing. We were connected through a mutual friend whom I lamented to about not having any clue how to continue following the path my uncle had originally laid before me. He mentioned that he knew just the right person and gave me an email address to contact. I sent an email to the address and was greeted warmly by the man who would become another mentor, and like an uncle to me.” Although Hendricks found him via a personal connection, Cousins runs a mentorship program that caters to a wide variety of industries. Wilson also believes in the power of networking—and not just for the job hunt. “One of the things I like about programming,” he said, “is that it’s a very collaborative community. Everybody wants to help everybody.” We remember that everyone had to start somewhere, and we’ll take time to help those starting out. It’s a very welcoming community. Just do it! We’re all here for you, and if you need help I’ll refer you. Asakura, by contrast, thinks that networking can help, but that it works a little differently in Japan than in other countries. “Don’t rely on it too much,” he said. “Unlike in Western countries, personal referrals don’t always lead directly to job opportunities in Japan. Your skills, effort, and consistency will matter more in the long run.” DO treat the job hunt like a job Once he’d graduated from Le Wagon, D’Alberti said, “I considered job-hunting my full-time job.” I checked all the possible networking events and meetup events that were going on in the city, and tried to attend all of them, every single day. I had a list of 10 different job boards that I would go and just refresh on a daily basis to see, ‘Okay, Is there anything new now?’ And, of course, I talked with recruiters. D’Alberti suggests beginning the search earlier than you think you need to. “I had started actively job hunting even before graduating [from Le Wagon],” he said. “That’s advice I give to everyone who joins the bootcamp. “Two weeks before graduation, you have one simple web application that you can show. You have a second one you’re working on in a team, and you have a third one that you know what it’s going to be about. So, already, there are three applications that you can showcase or you can use to explain your skills. I started going to meetups and to different events, talking with people, showing my CV.” The process wasn’t easy, as most companies and recruiters weren’t interested in hiring for junior roles. But his intensive strategy paid off within a month, as D’Albert landed three invitations to interview: one from a Japanese job board, one from a recruiter, and one from LinkedIn. For Asakura, treating job hunting like a job was as much for his mental health as for his career. “The biggest challenge was dealing with impostor syndrome and feeling like I didn’t belong because I didn’t have a computer science degree,” he explained. “I also experienced burnout from pushing myself too hard.” To cope, I stuck to a structured routine. I went to the gym daily to decompress, kept a consistent study schedule as if I were working full-time, and continued applying for jobs even when it felt hopeless. At first, Asakura tried to apply to jobs strategically by tracking each application, tailoring his resume, and researching every role. “But after dozens of rejections,” he said, “I eventually switched to applying more broadly and sent out over one hundred applications. I also reached out to friends who were already software engineers and asked for direct referrals, but unfortunately, nothing worked out.” Still, Asakura didn’t give up. He practiced interviews in both English and Japanese with his friends, and stayed in touch with recruiters. Most importantly, he kept developing and adding to his portfolio. DO make use of online resources “What ultimately helped me was staying active and visible,” Asakura said. I consistently updated my GitHub, LInkedIn, and Wantedly profiles. Eventually, I received a message on Wantedly from the CTO of a company who was impressed with my portfolio, and that led to my first developer job.” “If you have the time, certifications can also help validate your knowledge,” Asakura added, “especially in fields like cloud and AI. Some people may not realize this, but the rise of artificial intelligence is closely tied to the growth of cloud computing. Earning certifications such as AWS, Kubernetes, and others can give you a strong foundation and open new opportunities, especially as these technologies continue to evolve.” Hendricks also heavily utilized LinkedIn and similar sites, though in a slightly different way. “I would also emphasize the importance of knowing how to use job-hunting sites like Indeed and LinkedIn,” he said. “I had the best luck when I used them primarily to do initial research into companies, then applied directly through the companies’ own websites, rather than through job postings that filter applicants before their resumés ever make it to the actual people looking to hire.” In addition, Hendricks recommends studying coding interview prep tutorials from freeCodeCamp. Along with advice from his mentors and the online communities he joined, he credits those tutorials with helping him successfully receive offers after a long job hunt. DO highlight experience with Japanese culture and language Asakura felt that his experience in Japan, and knowledge of Japanese, gave him an edge. “I understand Japanese work culture [and] can speak the language,” Asakura said, “and as a Japanese national I didn’t require visa sponsorship. That made me a lower-risk hire for companies here.” Hendricks also felt that his excellent Japanese made him a more attractive hire. While applying, he emphasized to companies that he could be a bridge to the global market and business overseas. However, he also admitted this strategy steered him towards applying with more domestic Japanese companies, which were also less likely to hire someone without a computer science degree. “So,” he said, “it sort of washed out.” Wilson is another who put a lot of emphasis on his Japanese language skills, from a slightly different angle. A lot of interviewees typically don’t speak Japanese well . . . and a lot of companies here say that they’re very international, but if they want very good programmers, [those people] spend their lives programming, not studying English. So having somebody who can bridge the language gap on the IT side can be helpful. DO lean into your other experience Several career switchers discovered that their past experiences and skills, while not immediately relevant to their new career, still proved quite helpful in landing that first role—sometimes in very unexpected ways. When Wilson was pitching his language skills to companies, he wasn’t talking about just Japanese–English translation. He also highlighted his prior experience in sales to suggest that he could help communicate with and educate non-technical audiences. “Actually to be a software engineer, there’s a lot of technical communication you have to do.” I have worked with some incredible coders who are so good at the technical side and just don’t want to do the personal side. But for those of us who are not super-geniuses and can’t rely purely on our tech skills . . . there’s a lot of non-technical discussion that goes around building a product.” This strategy, while eventually fruitful, didn’t earn Wilson a job right away. Initially, he applied to more than sixty companies over the course of three to four months. “I didn’t have any professional [coding] experience, so it was actually quite a rough time,” he said. “I interviewed all over the place. I was getting rejected all over town.” The good news was, Wilson said, “I’m from Chicago. I don’t know what it is, but there are a lot of Chicagoans who work in Tokyo for whatever reason.” When he finally landed an interview, one of the three founders of the company was also from Chicago, giving them something in common. “We hit it off really well in the interview. I think that kind of gave me the edge to get the role, to be honest.” Like Wilson, D’Alberti found that his previous work as a marketer helped him secure his first developer role—which was ironic, he felt, given that he’d partially chosen to switch careers because he hadn’t been able to find an English-language marketing job in Japan. “I had my first interview with the CEO,” he told me, “and this was for a Japanese startup that was building chatbots, and they wanted to expand into the English market. So I talked with the CEO, and he was very excited to get to know me and sent me to talk with the CTO.” The CTO, unfortunately, wasn’t interested in hiring a junior developer with no professional experience. “And I thought that was the end of it. But then I got called again by the CEO. I wanted to join for the engineering position, and he wanted to have me for my marketing experience.” In the end we agreed that I would join in a 50-50 arrangement. I would do 50 percent of my job in marketing and going to conferences and talking to people, and 50 percent on the engineering side. I was like, ‘Okay, I’ll take that.’ This ended up working better than D’Alberti had expected, partially due to external circumstances. “When COVID came, we couldn’t travel abroad, so most of the job I was doing in my marketing role I couldn’t perform anymore. “So they sat me down and [said], ‘What are we going to do with you, since we cannot use you for marketing anymore?’ And I was like, ‘Well, I’m still a software developer. I could continue working in that role.’ And that actually allowed me to fully transition.” DON’T make these mistakes It was D’Alberti’s willingness to compromise on that first development role that led to his later success, so he would explicitly encourage other career-changers to avoid, in his own words, “being too picky.” This advice is based, not just on his own experience, but also on his time working as a teaching assistant at Le Wagon. “There were a couple of people who would be like, ‘Yeah, I’d really like to find a job and I’m not getting any interviews,’” he explained. “And then we’d go and ask, ‘Okay, how many companies are you applying to? What are you doing?’ But [they’d say] ‘No, see, [this company] doesn’t offer enough’ or ‘I don’t really like this company’ or ‘I’d like to do something else.’ Those who would be really picky or wouldn’t put in the effort, they wouldn’t land a job. Those who were deadly serious about ‘I need to get a job as a software developer,’ they’d find one. It might not be a great job, it might not be at a good company, but it would be a good first start from which to move on afterwards. Asakura also knew some other bootcamp graduates who struggled to find work. “A major reason was a lack of Japanese language skills,” he said. Even for junior roles, many companies in Japan require at least conversational Japanese, especially domestic ones. On the other hand, if you prioritize learning Japanese, that can give you an edge on entering the industry: “Many local companies are open to training junior developers, as long as they see your motivation and you can communicate effectively. International companies, on the other hand, often have stricter technical requirements and may pass on candidates without degrees or prior experience.” Finally, Hendricks said that during his own job hunt, “Not living in Tokyo was a problem.” It was something that he was able to overcome via diligent digital networking, but he’d encourage career-changers to think seriously about their future job prospects before settling outside a major metropolis in Japan. Their top advice I asked each developer to share their number one piece of advice for career-changers. D’Alberti wasn’t quite sure what to suggest, given recent changes in the tech market overall. “I don’t have clear advice to someone who’s trying to break into tech right now,” he said. “It might be good to wait and see what happens with the AI path. Might be good to actually learn how to code using AI, if that’s going to be the way to distinguish yourself from other junior developers. It might be to just abandon the idea of [being] a linear software developer in the traditional sense, and maybe look more into data science, if there are more opportunities.” But assuming they still decide ‘Yes, I want to join, I love the idea of being a software developer and I want to go forward’ . . . my main suggestion is patience. “It’s going to be tough,” he added. By contrast, Hendricks and Wilson had the same suggestion: if you want to change careers, then go for it, full speed ahead. “Do it now, or as soon as you possibly can,” Hendricks stated adamantly. His life has been so positively altered by discovering and pursuing his passion, that his only regret is he didn’t do it sooner. Wilson said something strikingly similar. “Do it. Just do it. I went back and forth a lot,” he explained. “‘Oh, should I do this, it’s so much money, I already have a job’ . . . just rip the bandaid off. Just do it. You probably have a good reason.” He pointed out that while starting over and looking for work is scary, it’s also possible that you’ll lose your current job anyway, at which point you’ll still be job hunting but in an industry you no longer even enjoy. “If you keep at it,” he said, “you can probably do it.” “Not to talk down to developers,” he added, “but it’s not the hardest job in the world. You have to study and learn and be the kind of person who wants to sit at the computer and write code, but if you’re thinking about it, you’re probably the kind of person who can do it, and that also means you can probably weather the awful six months of job hunting.” You only need to pass one job interview. You only need to get your foot in the door. Asakura agreed with “just do it,” but with a twist. “Build in public,” he suggested. “Share your progress. Post on GitHub. Keep your LinkedIn active.” Let people see your journey, because even small wins build momentum and credibility. “To anyone learning to code right now,” Asakura added, “don’t get discouraged by setbacks or rejections. Focus on building, learning, and showing up every day. Your portfolio speaks louder than your past, and consistency will eventually open the door.” If you want to read more how-tos and success stories around networking, working with recruitment agencies, writing your resume, etc., check out TokyoDev’s other articles. If you’d like to hear more about being a developer in Japan, we invite you to join the TokyoDev Discord, which has over 6,000 members as well as dedicated channels for resume review, job posts, life in Japan, and more.