A tale about requesting Chicago’s Mayor’s Office’s phone records.

from Civic Hax [alt+shift+b] in programming

Intro Back in 2014, I had the naive goal of finding evidence of collusion between mayoral candidates. The reasoning is longwinded and boring, so I won't go into it. My plan was to find some sort of evidence through a FOIA request or two for the mayor's phone records, find zero evidence of collusion, then move onto a different project like I normally do. What came instead was a painful year and a half struggle to get a single week's worth of phone records from Chicago's Office of the Mayor. Hope you enjoy and learn something along the way! Requests to Mayor's Office My first request was simple and assumed that the mayor's office had a modern phone. On Dec 8, 2014, I sent this anonymous FOIA request to Chicago's Office of the Mayor: Please attach all of the mayor's phone records from any city-owned phones (including cellular phones) over the past 4 years. Ten days later, I received a rejection back stating they didn't have any of the mayor's phone records: A FOIA request must be...

over a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Civic Hax

Chicago Parking Ticket Visualization

Intro Hi there! In this post, I want to show off a fun little web app I made for visualizing parking tickets in Chicago, but because I've spent so much time on the overall project, I figured I'd share the story that got me to this point. In many ways, this work is the foundation for my interest in public records and transparency, so it has a very special place in my heart. If you're here just to play around with a cool app, click here. Otherwise, I hope you enjoy this post and find it interesting. Also - please don't hesitate to share harsh criticism or suggestions. Enjoy! Project Beginnings Back in 2014, while on vacation, my car was towed for allegedly being in a construction zone. The cost to get the car out was quoted at around $700 through tow fees and storage fees that increase each day by $35. That's obviously a lot of money, so the night I noticed the car was towed, I began looking for ways to get the ticket thrown out in court. After some digging, I found a dataset on data.cityofchicago.org which details every street closure, including closures for construction. As luck would have it, the address that my car was towed at had zero open construction permits, which seemed like a good reason to throw out the fines. The next day I confirmed the lack of permits by calling Chicago's permit office - they mentioned they only thing found was a canceled construction permit by Comed. That same day, I scheduled a court date for the following day. When I arrived in the court room, I was immediately pulled aside by a city employee and was told that my case was being thrown out. They explained that the ticket itself "didn't have enough information", but besides that I wasn't told much! The judge then signed a form that allowed the release of my car at zero cost. At this point, I'd imagine most folk would just walk away with some frustration, but would be overall happy and move on. For whatever reason, though, I'm too stubborn for that - I couldn't get over the fact that Chicago didn't just spend five minutes checking the ticket before I appeared in court. And what bothered me more was that there isn't some sort of sorts automated check. Surely Chicago has some process in place within its $200 million ticketing system to make sure it's not giving out invalid tickets... Spoiler: it doesn't. The more I thought about it, the more I started wondering why Chicago's court system only favors those who have the time and resources to go to court and started wondering if it was possible to automate these checks myself using historical data. The hunch started based on what I'd seen from the recent "Open Gov" movement in to start making data open to the public by default. When I first started, I mostly just worked with some already publicly available datasets previously released by several news organizations. WBEZ in particular released a fairly large dataset of towing data that, while it was a compelling dataset, it unfortunately ultimately wasn't useful for my little project. So instead, I started working on a similar goal to programatically find invalid tickets in Chicago. That work and its research led to my very first FOIA request, where after some trial and error, received the records for 14m parking tickets. Trial Development and Trial Geocoding With the data at hand, I started throwing many, many unix one liners and gnuplot at the problem. This method worked well enough for small one-off checks, but for anything else, it's a complete mess of sed and awk statements. After unix-by-default I moved on to Python, where I made a small batch of surprisingly powerful, but scraggly scripts. One script, for example, would check to see whether a residential parking permit ticket was valid by comparing a ticket's address against a set of streets and its ranges found on data.cityofchicago.org. The script ended up finding a lot of residential permit tickets given outside of the sign's listed boundaries. Sadly though, none of the findings were relevant, since a ticket is given based on the physical location of the sign, not what the sign's location data represents. These sorts or problems ended up putting a halt to these sorts of scripts. With the next development iteration, I switched to visualizing parking tickets, where I essentially continued until today. Problem was, the original data had no lat/lng and had to be geocoded.. So, back when I first started doing this, there weren't that many geocoding services that were reasonably priced. Google was certainly possible, but its licensing and usage limits made Google a total no-go. The other two options I found were from Equifax (expensive) and the US Post Office (also expensive). So I went down a several hundred hour rabbit hole of attempting to geocode all addresses myself - entirely under the philosophy that all tickets had a story to tell, and anything lower than 90% geocoding was inexecusable. The first attempt at geocoding was decent, but only matched about 30% of tickets. It worked by tokenizing addresses using a wonderful python library called usaddress, then comparing the "string distance" of a street address against a known set of address already paired with lat/long. It got me to a point where I could make my first map visualization using some simple plotting: Jupyter Notebook Still, with only 30% geocoded, I felt that I could do better through many different methods that continued to use string distance - some implementations much fancier than others. Other methods, ahem, used lots of sed and vim, which I'm still not proud of. There were some attempts where I got close to 90% of addresses geocoded, though I ended up throwing each one out in fear that the lack of validation would bite me in the future. Spoiler: it did. Eventually, my life was made much easier thanks to the folks at SmartyStreets, who set me up an account with unlimited geocoding requests. By itself, SmartyStreets was able to geocode around 50% of the addresses, but when combined with some of my string-distance autocorrection, that number shot up to 70%. It wasn't perfect, but it was "good enough" to move on, and didn't really require me to trust my own set corrections and the anxiety that comes with that. Another New Dataset Fast forward about six months - I hadn't gotten much further on visualization, but instead started focusing on my first blog post. A bit around that post, ProPublica published some excellent work on parking tickets and ended up releasing its own tickets datasets to the public, one of which was geocoded. I'm happy to say I had a small part in helping out with, and then started using it for my own work in the hopes of using a common dataset between groups. More Geocoding Fun A high level of ProPublica's geocoding process is described in pretty good detail here, so if you're interested in how the geocoding was done, you should check that out. In brief, the dataset's addresses were geocoded using Geocodio. The geocoding process they used was dramatically simplified by replacing each address's last two digits with zeroes. It worked surprisingly well, and brought the percent of geocoded tickets to 99%, but this method had the unfortunate side effect of reducing mapping accuracy. So, because I have more opinions than I care to admit about geocoding, I had to look into how well the geocoding was done. Geocoding Strangeness After a few checks against the geocoded results, I noticed that a large portion of of the addresses wasn't geocoded correctly - often in strange ways. One example comes from the tendency of geocoders to favor one direction over the other for some streets. As can be seen below, the geocoding done on Michigan and Wells is completely demolished. What’s particularly notable here is that Michigan Ave has zero instances of “S Michigan” swapped to “N Michigan”. N Wells S Wells Total Total Tickets (pre-geocoding) 280,160 127,064 407,224 Unique Addresses (pre-geocoding) 3,254 3,190 6,444 Total Tickets (post-geocoding) 342,020 37,611 379,631 Unique Addresses (post-geocoding) 40 57 97 Direction is swapped 246 54,309 54,555 N Michigan S Michigan Total Total Tickets (pre-geocoding) 102,225 392,391 494,616 Unique Addresses (pre-geocoding) 2,916 16,585 19,501 Total Tickets (post-geocoding) 15,760 509,485 512,401 Unique Addresses (post-geocoding) 136 144 280 Direction is swapped 102,225 0 102,225 Interestingly, the total ticket count for Wells drops by about 30k after being geocoding. This turned out to be from Geocodio strangely renaming ‘200 S Wells’ to ‘200 W Hill St’. Similar also happens with numbered addresses similar to “83rd St”, where a steet name is often renamed to "100th". 100th street's ticket count appears to have six times as many tickets as it actually does because of this. These are just a few examples, but it's probably safe to say it's tip of the iceberg. My gut tells me that the address simplification had a major effect here. I think a simple solution of replacing the last two digits with 01 if odd, and 02 if even - instead of 0s - would do the trick. Quick Geocoding Story A few years ago, I approached the (ex) Chicago Chief of Open Data and asked if he, or someone in Chicago, was interested in a copy of geocoded addresses I’d worked on, since they don’t actually have anything geocoded (heh). He politely declined, and mentioned that since Chicago has its own geocoder, my geocoded addresses weren’t useful. I then asked if Chicago could run the parking ticket addresses through their geocoder – something that would then be requestable through FOIA. His response was, “No, because that would require a for loop”. Hmmmmm. Visualization to Validate Geocoding In one of my stranger attempts to validate the geocoding results, I wanted to see whether it was possible to use the distance a ticketer traveled to check if a particular geocode was botched or not. It ended up being less useful I'd hoped, but out of it, I made some pretty interesting looking visualizations: Left: Ticketer paths every 15 minutes with 24 hours decay (Animated). Right: All ticketer paths for a full month. This validation code, while it wasn't even remotely useful, ended up being the foundation for what I'm calling my final personal attempt to visualize Chicago's parking tickets. Final Version And with that, here is the final version of the parking ticket visualization app. Click the image below to navigate to the app. Bokeh Frontend The frontend for the app uses a python framework named Bokeh. It's an extremely powerful interactive visualization framework that dramatically reduces how much code it takes to make both simple and complex interfaces. It's not without its (many) faults, but in python land, I personally consider it to be the best available. That said, I wish the Bokeh devs would invest a lot more time improving the serverside documentation. There's just way too much guesswork, buggy behavior, and special one-offs required to match Bokeh's non-server functionality. Flask/Pandas Backend The original backend was PostgreSQL along with PostGIS (for GeoJSON creation) and TimeScaleDB (for timeseries charting). Sadly, PostgreSQL started struggling with larger queries, combined with memory exhausting itself quicker than I'd hoped. In the end, I ended up writing a Flask service that runs in PyPy with Pandas as an in-memory data store. This ended up working surprisingly well in both speed and modularity. That said, there’s still a lot more room for speed improvement - especially in caching! If I ever revisit the backend, I'll try giving PostgreSQL another chance, but throw a bunch more optimization its way. Please feel free to check out the application code. Interesting Findings To wrap this post up, I wanted to share some of the things I've found while playing around with the app. Everything below comes from screenshots from the app, so pardon any loss of information. Snow Route Tickets Outside 3AM-7AM If you park your car in a "snow route" between 3AM-7AM, you will get a ticket. Interestingly, between the hours of 8am to 11pm, over 1,000 tickets have been given over the past 15 years - mostly by CPD. Of these, only 6% have been contested – 6 of which the car owner was still liable for some reason. The other 94% should have automatically been thrown out, but Chicago has no systematic way of doing this. Shame. Highest value line = CPD's tickets. Expired Meter Within Central Business District – Outside of CBD. Downtown Chicago has an area that's officially described as the “Central Business District”. Within this area, tickets for expired meters end up costing $65 instead of the normal $50. In the past 13 years, over 52k of these tickets have been given outside of the Central Business District. From those 52k, only 5k tickets were taken to court, and 70% were thrown out. Overall, Chicago made around $725k off the $15 difference. Another instance of where Chicago could have systematically thrown out many tickets, but didn't. Downtown removed to highlight non-central business district areas. City sticker expiration tickets.. in the middle of July? Up until 2015, Chicago's city stickers expired at the end of June. A massive spike of tickets is bound to happen, but what's strange is that the spike happens in the middle of July, not the start of it. At the onset of each spikes, Chicago is making somewhere between $500k and $1m. This is a $200 ticket that has a nonstop effect on Chicago's poorer citizens. For more on this, check out this great ProPublica article. (The two different colors here come from the fact that Chicago changed the ticket description sometime in 2012.) Bike Lane Tickets With the increasing prevalence of bike lanes and the danger from cars parked in them, I was expecting to see a lot of tickets for parking in a bike lane. Unfortunately, that’s not the case. For example, in 2017, only around 3,000 tickets for parking in a bike lane were given. For more on this, click here. Wards Have different Street Cleaning times I found some oddities while searching for street cleaning tickets given at odd hours. Wards 48, 49 and 50 seem to give out parking tickets much later than the rest of Chicago. Then, starting at 6am, Ward 40 starts giving out tickets earlier than the rest of Chicago. Posted signs probably make it relatively clear that there’s going to be street cleaning, but it’s also probably fair to say that this is understandably confusing. Hopefully there’s a good reason these wards decided to be special. What’s Next? That's hard to say. There's a ton of work that needs to be done to address the systemic problems with Chicago's parking, but there just aren't enough people working on the problem. If you're interested in helping out, you should definitely download the data and start playing around with it! Also, if you're in Chicago, you should also check out ChiHackNight on Tuesday nights. Just make sure to poke some of us parking ticket nerds beforehand in #tickets on CHiHackNight's slack. That said, a lot of these issues should really be looked at by Chicago in more depth. The system used to manage parking ticket information - CANVAS - has cost Chicago around $200,000,000, and yet it's very clear that nobody from Chicago is interested in diving into the data. In fact, based on an old FOIA request, Chicago has only done a single spreadsheet's worth of analysis with its parking ticket data. I hope that some day Chicago starts automating some ticket validation workflows, but it will probably take a lot of public advocacy to compel them to do so. Unrelated Notes Recently I filed suit against Chicago after a FOIA request for the columns and table names of CANVAS was was rejected with a network security exemption. If successful, I hope that the information can be used to submit future FOIA requests using the database schema as the foundation. I'm extremely happy with the argument I made, and I plan on writing about it as the case comes to a close. Anywho, if you like this post and want to see more like it, please consider donating to my non-profit, Free Our Info, NFP which recently became a 501(c)(3). Much of the FOIA work from this blog (much of which is unwritten) is being filtered over to the NFP, so donations will allow similar work to continue. A majority of the proceedings will go directly into paying for FOIA litigation. So, the more donations I receive, the more I can hold government agencies accountable! Hope you enjoyed. Tags: tickets, foia, chicago, bokeh, visualization

over a year ago • 40 votes

That Time the City of Seattle Accidentally Gave Me 32m Emails for 40 Dollars

Background In my last post, I wrote about my adventure of requesting metadata for both phone calls and emails from the City of Chicago Office of the Mayor. The work there - and its associated frustration - sent me down a path of sending requests throughout the US to both learn whether these sorts of problems are systemic (megaspoiler: they are) and to also start mapping communication across the United States. Since then, I’ve submitted over a hundred requests for email metadata across the United States – at least two per state. The first large batch of requests for email metadata were sent to the largest cities of fourteen arbitrary states in a trial run of sorts. In the end of that batch, only two cities were willing to continue with the request - Houston and Seattle. Houston complied surprisingly quickly and snail mailed the metadata for 6m emails. Seattle on the other hand... The Request On April 2, 2017 I sent this fairly boilerplate request to Seattle's IT department: For all emails sent to/from any Seattle owned email address in 2017, please provide the following information: 1. From address 2. To address 3. bcc addresses 4. cc addresses 5. Time 6. Date Technically this request can done with a single line powershell command. At a policy level, though, it usually gets a lot of pushback. Seattle's first response included a bit of gobsmackery that I’ve almost become used to: Based on my preliminary research, there have been 5.5 million emails sent and 26.8 million emails received by seattle.gov email addresses in the past 90 days. This is a significant amount of records that will need to be reviewed prior to sharing them with you. Do you have a more targeted list of email addresses you might be interested in? If not, I will work to find out how long review will take and will be in touch. Of course, I still want the records, so - I would like to stick with this request as-is for all ~32m emails. Since this request is for metadata only, the amount of review needed should be relatively small. Fee Estimate of 33 Million Dollars A week later, I received this glorious response. Each paragraph is interesting in itself, so let’s break most of it down piece by piece. Rewording of request This acknowledges receipt of your public disclosure request C012032-041017 received on April 03, 2017 regarding: All emails sent to/from any Seattle owned email address in 2017 including metadata:1. From address2. To address3. bcc addresses4. cc addresses5. Time6. Date Notice the change of language from the original wording. Their rewording completely changes the scope of the request so that it's not just for metadata, but also the emails contents. No idea why they did that. Salary Fees With that being said, Seattle IT estimates spending 30 seconds to two minutes to review each email. It has been estimated that this work will take approximately 320 years of staff time at an expense of $33 million in salary. Wot. Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for! The estimate of 30s per metadata entry is also a bit suspect. Especially with the use of Excel, which would be useful for removing duplicates, etc. Storage Fees We estimate that this request contains 8-10 terabytes of information, for which we could need to stand up an FTP server through which the requester will be able to download cleared email meta data. As allowed under RCW 42.56.120, we would charge the requester for the actual copying costs of fulfilling this request. Based on the Seattle IT cost model used for internal City charge backs, the anticipated cost to the requester is $2,480 per year plus $2.11 per gigabyte of storage. We are still working on the storage requirements for this effort. If we assume 10 TB of storage, this would require $21,606.40/year in requester fees. Heh. Any sysadmin can tell you that The costs of storage doesn’t exactly come from the storage medium itself; administration costs, supporting hardware, etc, are the bulk of the costs. But come on, let’s be realistic here. There’s very little room for good faith in their cost estimates – especially since the last time a single gigabyte cost that much was between 2002 and 2004. That said – some other interesting things going on here: Their file size estimation is huge. For comparison, that Houston’s email metadata dump was only 1.2GB. The fact that they mention “meta data” [sic] implies that they did acknowledge that the request was for metadata. Seattle already uses Amazon S3 to store public records requests’ data. At the time, S3 was charging $.023/GB Continue anyway? At this time, the City anticipates that it will be able to provide a first installment of records on or about May 29, 2017. However, please note that this time estimate may change depending on the clarification you provide and as we continue to process your request. If the City does not hear from you within 30 days, the City will consider your request closed. Oddly, they don't actually close out the request and instead ask whether I wanted to continue or not. I responded to their amazing email by asking how many records I'd receive on May 29th, but never received an answer back. Reversal of the Original Cost Estimate On June 5, they sent a new response admitting that their initial fee estimation was wrong, and asked for $1.25 for two days’ (out of three months) of records: At this point I can send you an exel spreadsheet with the data points you are requesting. The cost for the first installment is $1.25 for emails sent or received on January 1 and 2, 2017. Because the spreadsheet does not contain the body of the email and just the metadata that you requested, no review will be necessary, and we'll be able to get this information to you at a faster pace than the 320 years quoted you earlier. Because they're asking for a single check for $1.25 for just two days’ worth of metadata – and wouldn’t send anything until that first check came in, my interpretation is that they’re taking a page from /r/maliciouscompliance and just making this request as painful as possible just for the simple sake of making it difficult. So in response, I preemptively sent them fourteen separate checks. The first thirteen checks were all around ~$1.25. That seemed to work, since they never asked for a single payment afterwards. From there, my inbox went mostly silent for two months, and I mostly forgot about the request, though they eventually cashed all of the checks and made me an account for their public records portal. SNAFU Fast forward to August 22, when I randomly added that email account back to my phone. Unexpectedly, it turned out they actually finished the request! And without a bill for millions of dollars! Sure enough, their public records request portal had about 400 files available to download, which all in all contained metadata for about 32 million emails. Neat! Problem though... they accidentally included the first 256 characters of all 32 million emails. Here are some things I found in the emails: Usernames and passwords. Credit card numbers. Social security numbers and drivers licenses. Ongoing police investigations and arrest reports. Texts of cheating husbands to their lovers. FBI Investigations. Zabbix alerts. In other words... they just leaked to me a massive dataset filled with intimately private information. In the process, they very likely broke many laws, including the Privacy Act of 1974 and many of WA's own public records laws. Frankly, I'm still at a loss of words. It’s hard to say how any of this happened exactly, but odds are that a combination of request’s rewording and the original public records officer going on vacation led to a communication breakdown. I don’t want to dwell on the mistake itself, so I’ll stop it at that. Side note to Seattle's IT department - clean up your disks. You shouldn't have that many disks at 100%! Raising the Issue I responded as passively as possible in the hopes that they’d catch their mistake on their own: The responsive records are not consistent with my request and includes much more info than I initially requested. Could you please revisit this request and provide the records responsive to my initial request? Their response: The information that you requestedis located in columns: From address = column J To address = column K bcc address = column M cc address = column L Time and date = column R of the reports. The records were generatedfrom a system report and I am unable to limit the report to generate only thefields you requested. The City has no duty to create a record that doesnot exist. As such, we have provided all records responsive to yourrequest and consider your request closed. Disregarding the fact that they used a very common tactic of denying information on the basis that its disclosure would require the creation of new records… they didn’t get the point. I explained what information they leaked, and made it very clear how I was going to escalate this: Please address this matter as if it was a large data breach. For now, I will be raising this matter to the WA Office of Privacy and Data Protection. None of the files provided to me have been shared with anyone else, nor do I have any future intention of sharing. Their response: Thank you for your email and bringing this inadvertent error to our attention so quickly. We have temporarily suspended access to GovQA while we look into the cause of this issue. We are also working on reprocessing your request and anticipate providing you with corrected copies of the records you requested through GovQA next week. In the meantime, please do not review, share, copy or otherwise use these records for any purpose. We are sorry for any inconvenience. Phone Call Not too long after that, after contacting some folks on Seattle’s Open Data Slack, I found my way onto a conference phone call with both Seattle’s Chief Technology officer and their Chief Privacy Officer and we discussed what happened, and what should happen with the records. They thanked me for bringing the situation to their attention and all that, but the mood of the call was as if both parties had a knife behind their back. Somewhere towards the end of the call, I asked them if it was okay to keep the emails. Why not at least ask, right? Funny enough, in the middle of that question, my internet died and interrupted the call for the first time in the six months I lived in that house. Odd. It came back ten minutes later, and I dialed back into the conference line, but the mood of the call pretty much 180’d. They told me: All files were to be deleted. Seattle would hire Kroll to scan my hard drives to prove deletion. Agreeing to #1 and #2 would give me full legal indemnification. This isn't something I'm even remotely cool with, so we ended the call a couple minutes later, and agreed to have our lawyers speak going forward. Deleting the Files After that call, I asked my lawyer to reach out to their lawyer and was pretty much told that Seattle was approaching the problem as if they were pursuing Computer Fraud And Abuse (CFAA) charges. For information that they sent. Jiminey Cricket.. So, I deleted the files. Most of what happened next over a month or so was mostly between their lawyer and mine, so there’s not really that much for me to say. Early on I suggested that I write an affidavit that explains what happened, how I deleted the files, and I validated that the files were deleted. They mostly agreed, but still wanted to throw some silly assurance things my way – including asking me to run a bash script to overwrite any unused disk space with random bits. I eventually ran zerofree and fstrim instead, and they accepted the affidavit. No more legal threats from there. Seattle’s Reaction About a week after the phone call, a Seattle city employee contacted Seattle’s KIRO7 about the incident. In KIRO7's investigation, they learned that, Seattle hadn’t sent any disclosure of the leak - something required by WA’s public records request law. Only after their investigation did Seattle actually notify its employees about the emails leak. Link to their story (video inside). A week later, another article was published by Seattle’s Crosscut which goes into a lot of detail, including some history of Seattle's IT department. This line towards the bottom still makes me laugh a little: The buffer against potential legal and administrative chaos in this scenario is only that Chapman has turned out to be, as Armbruster described him, a "good Samaritan." Efforts to track down Chapman were not successful; Crosscut contacted several Matthew Chapmans who denied being the requester. On January 19th, Seattle's CTO, Michael Mattmiller gave his resignation. Whether his resignation is related to the email leak is hard to say, but I just think the timing makes it worth mentioning. Finally – The Metadata Starting January 26th, Seattle started sending installments of the email metadata I requested. So far they've sent 27 million emails. As of the writing of this post, there are only two departments who haven’t provided their email metadata: the Police Department and Human Services. You can download the raw data here. Some things about the dataset: It’s very messy – triple quotes, semicolons, commas, oh my. There are a millions of systems alerts. For seattle.gov → seattle.gov communication, there are two distinct metadata records. In any case, it's still somewhat workable, so I've been working on a proof of concept for its use in the greater context of public records laws. Not ready to talk much about it yet, so here's is a gephi graph of one day's worth of metadata. Its layout is Yifan Hu and filtered with a k-core minimum of 5 and a minimum degree of 5: Please reach out to me if you'd like to help model these networks. One Last Thing: Legislative Immunity Kerfuffle This last section might not be related, but the timing is interesting, so I feel it’s worth mentioning. On February 23 - between the first installment of email metadata and the second - WA’s legislature attempted to pass SB6617, a bill which removes requirements for disclosure of many of their records – including email exchanges - from WA’s public records laws. What’s particularly interesting about this events of this bill is that it took less than 24 hours from the time it was read for the first time to the time that it passed at both the House and Senate and sent to the Governor’s office. Seattle Times wrote a good article about it. Thankfully, after the WA governor’s office received over 6,300 phone calls, 100 letters, and over 12,500 emails, the governor ended up vetoing the bill. Neat. It's hard to say if that caused any sort of delay, but after a month and a half of waiting: How are the installments looking? I saw that there was some recent legislative immunity kerfuffle around emails. Is that related to any delays? And got this response: Good news. The recent Washington state legislative immunity kerfuffle will not impact your installments. We have fixed the bug that was impacting our progress and are now on our way. In fact, I'll have more records for you this week. A month later, they started sending the rest. What’s Next? The work done throughout this post has led to a massive trove of information that ought to be enormously useful in understanding the dynamics of one the US's biggest cities. A big hope in making this sort of information available to the public is that it will help in changing the dynamic of understanding what sorts of information is accessible. That said, this is just one city of many which have given me email metadata. As more of it comes through, I’ll be able to map out more and more, but the difficulty in requesting those records continues to get in the way. Once I get some of these bigger stories out of the way, I’ll start writing fewer stories and write more about public records requesting fundamentals – particularly for digital records. Next post will be about my ongoing suit against the White House OMB for email metadata from January 2017. This past Wednesday was the first court date - where the defendent's counsel never showed up. Hope you enjoyed! Tags: seattle, foia, kerfuffle, metadata

over a year ago • 38 votes

Using FOIA Data and Unix to halve major source of parking tickets

Intro This'll be my first blog post on the internet, ever. Hopefully it's interesting and accurate. Please point out any mistakes if you see any! In 2016, I did some work in trying to find some hotspot areas for parking tickets to see if a bit of data munging could reduce those area's parking tickets. In the end, I only really got one cleaned up, but it was one of the most-ticketed spots in all of Chicago and led to about a 50% reduction in parking tickets. Here's a bit of that story. Getting the data through FOIA: The system Chicago uses to store its parking tickets is called CANVAS. It's short for "City of Chicago Violation, Noticing and Adjudication Business Process And System Support" [sic] and managed by IBM. Its most recent contract started in 2012, expires in 2022, and has a pricetag of over $190 million. Most of Chicago's contracts and their Requests For Procurement (RFP) PDFs are published online. In CANVAS's contracts it gives a fair amount of info on CANVAS's backend infrastructure, including the fact that it uses Oracle 10g. In other words, a FOIA request can be fulfilled by IBM running some simple SQL. CANVAS Technical Spec CANVAS Contract CANVAS Request For Procurement (RFP) and Contract With that information at hand, and a couple failed FOIA requests later, I sent this request to get the parking ticket data from Jan 1, 2009 to Mar 10, 2016: "Please provide to me all possible information on all parking tickets between 2009 and the present day. This should include any information related to the car (make, etc), license plate, ticket, ticketer, ticket reason(s), financial information (paid, etc), court information (contested, etc), situational (eg, time, location), and photos/videos. Ideally, this should also include any relevant ticket-related information stored within CANVAS. [...] This information will be used for data analysis for research. Thanks in advance, Matt Chapman" Data About a month later, a guy named Carl (in a fancy suit), handed me a CD with some extremely messy data in a semicolon delimited file named A50462_TcktsIssdSince2009.txt. The file had info on 17,806,818 parking tickets and spans from Jan 1, 2009 to Mar 10, 2016. The data itself looks like this: head -5 A50462_TcktsIssdSince2009.txt Ticket Number;License Plate Number;License Plate State;License Plate Type;Ticket Make;Issue Date;Violation Location;Violation Code;Violation Description;Badge;Unit;Ticket Queue;Hearing Dispo 39596087;zzzzzz;IL;PAS;VOLV;03/03/2003 11:25 am;3849 W CONGRESS;0976160F;EXPIRED PLATES OR TEMPORARY REGISTRATION;11870;701;Paid; 40228076;zzzzzz;IL;TRK;FORD;03/01/2003 12:29 am;3448 N OKETO;0964170A;TRUCK,RV,BUS, OR TAXI RESIDENTIAL STREET;17488;016;Define; 40480875;zzzzzz;IL;PAS;PONT;03/01/2003 09:45 pm;8135 S PERRY;0964130;PARK OR BLOCK ALLEY;17575;006;Notice; 40718783;zzzzzz;IL;PAS;ISU;03/02/2003 06:02 pm;6928 S CORNELL;0976160F;EXPIRED PLATES OR TEMPORARY REGISTRATION;7296;003;Paid;Liable Some things to note about the data: The file is semicolon delimited. Each address is hand typed on a handheld device, often with gloves. There are millions of typos in the address column. Including over 50,000 semicolons! There is no lat/lon. What that amounts to is an extremely, extremely messy and unpredictable dataset that's incredibly to difficult to accurately map to lat/lon, which is needed for any sort of comprehensive GIS analysis. There are a bunch of geocoder services that can help out here, but most of them have about a 50% accuracy rate. That said, with the help of a bit of scrubbing, that number can be boosted to closer to 90%. Another post for another time. Here’s a sample list of Lake Shore Drive typos: Laks Shore Dr Lawkeshore Dr West Lkaeshore Dr Lkae Shore Dr Lkae Shore Drive Lkaeshore Dr West Original Analysis I was particularly interested in finding areas that had hotspot areas that stood out. A lot of my time was spent just throwing hacky code at the problem and eventually wrote out two series of (hacky) commands that led to identifying a potentially fixable spot. Originally, the work and analysis I was doing was with a combination of unix commands and gnuplot. Since then, I've migrated my code to python + matplotlib + SQL. But, for the sake of this blog, I wanted to show the original analysis. Get count of tickets at addresses and ignoring first two digits: $ mawk -F';' '{print $7}' all_tickets.orig.txt | sed -r 's/^([0-9]*)[0-9][0-9] (.*)/\100 \2/' | sed -r 's/ (BLVD|ST|AV|AVE|RD)$//' | sort | uniq -c | sort -nr 79320 1900 W OGDEN 60059 1100 N STATE 50594 100 N WABASH 44503 1400 N MILWAUKEE 43121 1500 N MILWAUKEE 43030 2800 N BROADWAY 42294 2100 S ARCHER 42116 1900 W HARRISON Get count of tickets, by ticket type, at specific addresses: $ mawk -F';' '{print $9,$7}' A50462_TcktsIssdSince2009.txt | sed -r 's/ (BLVD|ST|AV|AVE|RD)$//' | sort --parallel=4 | uniq -c | sort -nr 12510 EXPIRED PLATES OR TEMPORARY REGISTRATION 5050 W 55TH 9636 PARKING/STANDING PROHIBITED ANYTIME 835 N MICHIGAN 8943 EXPIRED PLATES OR TEMPORARY REGISTRATION 1 W PARKING LOT A 6168 EXPIRED PLATES OR TEMPORARY REGISTRATION 1 W PARKING LOT E 5938 PARKING/STANDING PROHIBITED ANYTIME 500 W MADISON 5663 PARK OR STAND IN BUS/TAXI/CARRIAGE STAND 1166 N STATE 5527 EXPIRED METER OR OVERSTAY 5230 S LAKE PARK 4174 PARKING/STANDING PROHIBITED ANYTIME 1901 W HARRISON 4137 REAR AND FRONT PLATE REQUIRED 1 W PARKING LOT A Both bits of code roughly show that there's something going on at 1100N state street, and 1166 N State St looks particularly suspicious.. So, have a look at the original set of signs: Things going on that make this spot confusing: This is a taxi stand from 7pm to 5am for three cars’ lengths. Parking in a taxi stand is a $100 ticket. When this spot isn’t a taxi stand, it’s metered parking – for a parking meter beyond an alleyway. It’s possible to pay for parking here after 7pm, which makes it look like parking is acceptable – especially with the “ParkChicago” sign floating there. Confusion creates more confusion – if one car parks there, then more cars follow. Cha-ching. Contacting the 2nd Ward With all that in mind, I contacted the second ward’s alderman’s office on April 12 explaining this, and got back this response: "Hello Matt, […]The signs and the ordinance are currently being investigated. In the interim, I highly recommend that you do not park there to avoid any further tickets. Lisa Ryan Alderman Brian Hopkins - 2nd Ward" The Fix On 1/11/17 I received this email from Lisa: "Matt, I don't know if you noticed the additional signage installed on State Street at the 3 Taxi Stand. This should elevate [sic] any further confusion of vehicle parking. Thank you for your patience. Lisa Ryan" Sure enough, two new signs were added! The new taxi stand sign sets a boundary for a previously unbounded taxi stand. The No Parking sign explicitly makes it clear that parking here during taxi stand hours is a fineable offense. Neat! And then there's this guy: Results I recently decided to look at the number of tickets at that spot. Armed with a new set of data from another FOIA request, I did some analysis with python, pandas, and SQL. What I found is that the addition of a new sign effectively led to a 50% reduction in parking tickets between 1150 and 1200 N State St. Adding it all up, that's about 400 tickets fewer in 2017 and 200 so far in 2018 compared to 2016. All in all, that's about $60,000 worth in parking tickets! The drop in slope to about 50% matches perfectly with Lisa’s email: And then comparing 2016 to 2017: What's next? All in all, the number of parking tickets is going up in Chicago, and this work shows that something can be done, even if small. This work is only on one small section of road, but I'm convinced that similar work can be on a systematic scale. It's mostly just a matter of digging through data and working directly with each ward. The later analysis done here was also only done on the most recent dataset that the Department of Revenue they gave me. The two datasets have a different set of columns, so the two datasets need to be combined still. I hope to accomplish that soon! Analysis Code Data used in this blog. Tags: FOIA, parkingtickets, civics, unix, python

over a year ago • 35 votes

over a year ago • 42 votes

More in programming

strongly typed?

What does it mean when someone writes that a programming language is “strongly typed”? I’ve known for many years that “strongly typed” is a poorly-defined term. Recently I was prompted on Lobsters to explain why it’s hard to understand what someone means when they use the phrase. I came up with more than five meanings! how strong? The various meanings of “strongly typed” are not clearly yes-or-no. Some developers like to argue that these kinds of integrity checks must be completely perfect or else they are entirely worthless. Charitably (it took me a while to think of a polite way to phrase this), that betrays a lack of engineering maturity. Software engineers, like any engineers, have to create working systems from imperfect materials. To do so, we must understand what guarantees we can rely on, where our mistakes can be caught early, where we need to establish processes to catch mistakes, how we can control the consequences of our mistakes, and how to remediate when somethng breaks because of a mistake that wasn’t caught. strong how? So, what are the ways that a programming language can be strongly or weakly typed? In what ways are real programming languages “mid”? Statically typed as opposed to dynamically typed? Many languages have a mixture of the two, such as run time polymorphism in OO languages (e.g. Java), or gradual type systems for dynamic languages (e.g. TypeScript). Sound static type system? It’s common for static type systems to be deliberately unsound, such as covariant subtyping in arrays or functions (Java, again). Gradual type systems migh have gaping holes for usability reasons (TypeScript, again). And some type systems might be unsound due to bugs. (There are a few of these in Rust.) Unsoundness isn’t a disaster, if a programmer won’t cause it without being aware of the risk. For example: in Lean you can write “sorry” as a kind of “to do” annotation that deliberately breaks soundness; and Idris 2 has type-in-type so it accepts Girard’s paradox. Type safe at run time? Most languages have facilities for deliberately bypassing type safety, with an “unsafe” library module or “unsafe” language features, or things that are harder to spot. It can be more or less difficult to break type safety in ways that the programmer or language designer did not intend. JavaScript and Lua are very safe, treating type safety failures as security vulnerabilities. Java and Rust have controlled unsafety. In C everything is unsafe. Fewer weird implicit coercions? There isn’t a total order here: for instance, C has implicit bool/int coercions, Rust does not; Rust has implicit deref, C does not. There’s a huge range in how much coercions are a convenience or a source of bugs. For example, the PHP and JavaScript == operators are made entirely of WAT, but at least you can use === instead. How fancy is the type system? To what degree can you model properties of your program as types? Is it convenient to parse, not validate? Is the Curry-Howard correspondance something you can put into practice? Or is it only capable of describing the physical layout of data? There are probably other meanings, e.g. I have seen “strongly typed” used to mean that runtime representations are abstract (you can’t see the underlying bytes); or in the past it sometimes meant a language with a heavy type annotation burden (as a mischaracterization of static type checking). how to type So, when you write (with your keyboard) the phrase “strongly typed”, delete it, and come up with a more precise description of what you really mean. The desiderata above are partly overlapping, sometimes partly orthogonal. Some of them you might care about, some of them not. But please try to communicate where you draw the line and how fuzzy your line is.

yesterday • 8 votes

Logical Duals in Software Engineering

(Last week's newsletter took too long and I'm way behind on Logic for Programmers revisions so short one this time.1) In classical logic, two operators F/G are duals if F(x) = !G(!x). Three examples: x || y is the same as !(!x && !y). <>P ("P is possibly true") is the same as ![]!P ("not P isn't definitely true"). some x in set: P(x) is the same as !(all x in set: !P(x)). (1) is just a version of De Morgan's Law, which we regularly use to simplify boolean expressions. (2) is important in modal logic but has niche applications in software engineering, mostly in how it powers various formal methods.2 The real interesting one is (3), the "quantifier duals". We use lots of software tools to either find a value satisfying P or check that all values satisfy P. And by duality, any tool that does one can do the other, by seeing if it fails to find/check !P. Some examples in the wild: Z3 is used to solve mathematical constraints, like "find x, where f(x) >= 0. If I want to prove a property like "f is always positive", I ask z3 to solve "find x, where !(f(x) >= 0), and see if that is unsatisfiable. This use case powers a LOT of theorem provers and formal verification tooling. Property testing checks that all inputs to a code block satisfy a property. I've used it to generate complex inputs with certain properties by checking that all inputs don't satisfy the property and reading out the test failure. Model checkers check that all behaviors of a specification satisfy a property, so we can find a behavior that reaches a goal state G by checking that all states are !G. Here's TLA+ solving a puzzle this way.3 Planners find behaviors that reach a goal state, so we can check if all behaviors satisfy a property P by asking it to reach goal state !P. The problem "find the shortest traveling salesman route" can be broken into some route: distance(route) = n and all route: !(distance(route) < n). Then a route finder can find the first, and then convert the second into a some and fail to find it, proving n is optimal. Even cooler to me is when a tool does both finding and checking, but gives them different "meanings". In SQL, some x: P(x) is true if we can query for P(x) and get a nonempty response, while all x: P(x) is true if all records satisfy the P(x) constraint. Most SQL databases allow for complex queries but not complex constraints! You got UNIQUE, NOT NULL, REFERENCES, which are fixed predicates, and CHECK, which is one-record only.4 Oh, and you got database triggers, which can run arbitrary queries and throw exceptions. So if you really need to enforce a complex constraint P(x, y, z), you put in a database trigger that queries some x, y, z: !P(x, y, z) and throws an exception if it finds any results. That all works because of quantifier duality! See here for an example of this in practice. Duals more broadly "Dual" doesn't have a strict meaning in math, it's more of a vibe thing where all of the "duals" are kinda similar in meaning but don't strictly follow all of the same rules. Usually things X and Y are duals if there is some transform F where X = F(Y) and Y = F(X), but not always. Maybe the category theorists have a formal definition that covers all of the different uses. Usually duals switch properties of things, too: an example showing some x: P(x) becomes a counterexample of all x: !P(x). Under this definition, I think the dual of a list l could be reverse(l). The first element of l becomes the last element of reverse(l), the last becomes the first, etc. A more interesting case is the dual of a K -> set(V) map is the V -> set(K) map. IE the dual of lived_in_city = {alice: {paris}, bob: {detroit}, charlie: {detroit, paris}} is city_lived_in_by = {paris: {alice, charlie}, detroit: {bob, charlie}}. This preserves the property that x in map[y] <=> y in dual[x]. And after writing this I just realized this is partial retread of a newsletter I wrote a couple months ago. But only a partial retread! ↩ Specifically "linear temporal logics" are modal logics, so "eventually P ("P is true in at least one state of each behavior") is the same as saying !always !P ("not P isn't true in all states of all behaviors"). This is the basis of liveness checking. ↩ I don't know for sure, but my best guess is that Antithesis does something similar when their fuzzer beats videogames. They're doing fuzzing, not model checking, but they have the same purpose check that complex state spaces don't have bugs. Making the bug "we can't reach the end screen" can make a fuzzer output a complete end-to-end run of the game. Obvs a lot more complicated than that but that's the general idea at least. ↩ For CHECK to constraint multiple records you would need to use a subquery. Core SQL does not support subqueries in check. It is an optional database "feature outside of core SQL" (F671), which Postgres does not support. ↩

2 days ago • 8 votes

Omarchy 2.0

Omarchy 2.0 was released on Linux's 34th birthday as a gift to perhaps the greatest open-source project the world has ever known. Not only does Linux run 95% of all servers on the web, billions of devices as an embedded OS, but it also turns out to be an incredible desktop environment! It's crazy that it took me more than thirty years to realize this, but while I spent time in Apple's walled garden, the free software alternative simply grew better, stronger, and faster. The Linux of 2025 is not the Linux of the 90s or the 00s or even the 10s. It's shockingly more polished, capable, and beautiful. It's been an absolute honor to celebrate Linux with the making of Omarchy, the new Linux distribution that I've spent the last few months building on top of Arch and Hyprland. What began as a post-install script has turned into a full-blown ISO, dedicated package repository, and flourishing community of thousands of enthusiasts all collaborating on making it better. It's been improving rapidly with over twenty releases since the premiere in late June, but this Version 2.0 update is the biggest one yet. If you've been curious about giving Linux a try, you're not afraid of an operating system that asks you to level up and learn a little, and you want to see what a totally different computing experience can look and feel like, I invite you to give it a go. Here's a full tour of Omarchy 2.0.

3 days ago • 8 votes

Dissecting the Apple M1 GPU, the end

In 2020, Apple released the M1 with a custom GPU. We got to work reverse-engineering the hardware and porting Linux. Today, you can run Linux on a range of M1 and M2 Macs, with almost all hardware working: wireless, audio, and full graphics acceleration. Our story begins in December 2020, when Hector Martin kicked off Asahi Linux. I was working for Collabora working on Panfrost, the open source Mesa3D driver for Arm Mali GPUs. Hector put out a public call for guidance from upstream open source maintainers, and I bit. I just intended to give some quick pointers. Instead, I bought myself a Christmas present and got to work. In between my university coursework and Collabora work, I poked at the shader instruction set. One thing led to another. Within a few weeks, I drew a triangle. In 3D graphics, once you can draw a triangle, you can do anything. Pretty soon, I started work on a shader compiler. After my final exams that semester, I took a few days off from Collabora to bring up an OpenGL driver capable of spinning gears with my new compiler. Over the next year, I kept reverse-engineering and improving the driver until it could run 3D games on macOS. Meanwhile, Asahi Lina wrote a kernel driver for the Apple GPU. My userspace OpenGL driver ran on macOS, leaving her kernel driver as the missing piece for an open source graphics stack. In December 2022, we shipped graphics acceleration in Asahi Linux. In January 2023, I started my final semester in my Computer Science program at the University of Toronto. For years I juggled my courses with my part-time job and my hobby driver. I faced the same question as my peers: what will I do after graduation? Maybe Panfrost? I started reverse-engineering of the Mali Midgard GPU back in 2017, when I was still in high school. That led to an internship at Collabora in 2019 once I graduated, turning into my job throughout four years of university. During that time, Panfrost grew from a kid’s pet project based on blackbox reverse-engineering, to a professional driver engineered by a team with Arm’s backing and hardware documentation. I did what I set out to do, and the project succeeded beyond my dreams. It was time to move on. What did I want to do next? Finish what I started with the M1. Ship a great driver. Bring full, conformant OpenGL drivers to the M1. Apple’s drivers are not conformant, but we should strive for the industry standard. Bring full, conformant Vulkan to Apple platforms, disproving the myth that Vulkan isn’t suitable for Apple hardware. Bring Proton gaming to Asahi Linux. Thanks to Valve’s work for the Steam Deck, Windows games can run better on Linux than even on Windows. Why not reap those benefits on the M1? Panfrost was my challenge until we “won”. My next challenge? Gaming on Linux on M1. Once I finished my coursework, I started full-time on gaming on Linux. Within a month, we shipped OpenGL 3.1 on Asahi Linux. A few weeks later, we passed official conformance for OpenGL ES 3.1. That put us at feature parity with Panfrost. I wanted to go further. OpenGL (ES) 3.2 requires geometry shaders, a legacy feature not supported by either Arm or Apple hardware. The proprietary OpenGL drivers emulate geometry shaders with compute, but there was no open source prior art to borrow. Even though multiple Mesa drivers need geometry/tessellation emulation, nobody did the work to get there. My early progress on OpenGL was fast thanks to the mature common code in Mesa. It was time to pay it forward. Over the rest of the year, I implemented geometry/tessellation shader emulation. And also the rest of the owl. In January 2024, I passed conformance for the full OpenGL 4.6 specification, finishing up OpenGL. Vulkan wasn’t too bad, either. I polished the OpenGL driver for a few months, but once I started typing a Vulkan driver, I passed 1.3 conformance in a few weeks. What remained was wiring up the geometry/tessellation emulation to my shiny new Vulkan driver, since those are required for Direct3D. Et voilà, Proton games. Along the way, Karol Herbst passed OpenCL 3.0 conformance on the M1, running my compiler atop his “rusticl” frontend. Meanwhile, when the Vulkan 1.4 specification was published, we were ready and shipped a conformant implementation on the same day. After that, I implemented sparse texture support, unlocking Direct3D 12 via Proton. …Now what? Ship a great driver? Check. Conformant OpenGL 4.6, OpenGL ES 3.2, and OpenCL 3.0? Check. Conformant Vulkan 1.4? Check. Proton gaming? Check. That’s a wrap. We’ve succeeded beyond my dreams. The challenges I chased, I have tackled. The drivers are fully upstream in Mesa. Performance isn’t too bad. With the Vulkan on Apple myth busted, conformant Vulkan is now coming to macOS via LunarG’s KosmicKrisp project building on my work. Satisfied, I am now stepping away from the Apple ecosystem. My friends in the Asahi Linux orbit will carry the torch from here. As for me? Onto the next challenge!

3 days ago • 12 votes

Changing Careers to Software Development in Japan

TokyoDev has published a number of different guides on coming to Japan to work as a software developer. But what if you’re already employed in another industry in Japan, and are considering changing your career to software development? I interviewed four people who became developers after they moved to Japan, for their advice and personal experiences on: Why they chose development How they switched careers How they successfully found their first jobs What mistakes they made in the job hunt The most important advice they give to others Why switch to software development? A lifelong goal For Yuta Asakura, a career in software was the dream all along. “I’ve always wanted to work with computers,” he said, “but due to financial difficulties, I couldn’t pursue a degree in computer science. I had to start working early to support my single mother. As the eldest child, I focused on helping my younger brother complete his education.” To support his family, Asakura worked in construction for eight years, eventually becoming a foreman in Yokohama. Meanwhile, his brother graduated, and became a software engineer after joining the Le Wagon Tokyo bootcamp. About a year before his brother graduated, Asakura began to delve back into development. “I had already begun self-studying in my free time by taking online courses and building small projects,” he explained. “ I quickly became hooked by how fun and empowering it was to learn, apply, and build. It wasn’t always easy. There were moments I wanted to give up, but the more I learned, the more interesting things I could create. That feeling kept me going.” What truly inspired me was the idea of creating something from nothing. Coming from a construction background, I was used to building things physically. But I wanted to create things that were digital, scalable, borderless, and meaningful to others. An unexpected passion As Andrew Wilson put it, “Wee little Andrew had a very digital childhood,” full of games and computer time. Rather than pursuing tech, however, he majored in Japanese and moved to Japan in 2012, where he initially worked as a language teacher and recruiter before settling into sales. Wilson soon discovered that sales wasn’t really his strong suit. “At the time I was selling three different enterprise software solutions.” So I had to have a fairly deep understanding of that software from a user perspective, and in the course of learning about these products and giving technical demonstrations, I realized that I liked doing that bit of my job way more than I liked actually trying to sell these things. Around that time, he also realized he didn’t want to manually digitize the many business cards he always collected during sales meetings: “That’s boring, and I’m lazy.” So instead, he found a business card-scanning app, made a spreadsheet to contain the data, automated the whole process, and shared it internally within his company. His manager approached him soon afterwards, saying, “You built this? We were looking to hire someone to do this!” Encouraged, Wilson continued to develop it. “As soon as I was done with work,” he explained with a laugh, “I was like, ‘Oh boy, I can work on my spreadsheet!’” As a result, Wilson came to the conclusion that he really should switch careers and pursue his passion for programming. Similarly to Wilson, Malcolm Hendricks initially focused on Japanese. He came to Japan as an exchange student in 2002, and traveled to Japan several more times before finally relocating in 2011. Though his original role was as a language teacher, he soon found a job at a Japanese publishing company, where he worked as an editor and writer for seven years. However, he felt burned out on the work, and also that he was in danger of stagnating; since he isn’t Japanese, the road to promotion was a difficult one. He started following some YouTube tutorials on web development, and eventually began creating websites for his friends. Along the way, he fell in love with development, on both a practical and a philosophical level. “There’s another saying I’ve heard here and there—I don’t know exactly who to attribute it to—but the essence of it goes that ‘Computer science is just teaching rocks how to think,’” Hendricks said. “My mentor Bob has been guiding me through the very fundamentals of computer science, down to binary calculations, Boolean logic, gate theory, and von Neumann architecture. He explains the fine minutia and often concludes with, ‘That’s how it works. There’s no magic to it.’ “Meanwhile, in the back of my mind, I can’t help but be mystified at the things we are all now able to do, such as having video calls from completely different parts of the world, or even me here typing on squares of plastic to make letters appear on a screen that has its own source of light inside it. . . . [It] sounds like the highest of high-fantasy wizardry to me.” I’ve always had a love for technomancy, but I never figured I might one day get the chance to be a technomancer myself. And I love it! We have the ability to create nigh unto anything in the digital world. A practical solution When Paulo D’Alberti moved to Japan in 2019, he only spoke a little Japanese, which limited his employment prospects. With his prior business experience, he landed an online marketing role for a blockchain startup, but eventually exited the company to pursue a more stable work environment. “But when I decided to leave the company,” D’Alberti said, “my Japanese was still not good enough to do business. So I was at a crossroads.” Do I decide to join a full-time Japanese language course, aiming to get JLPT N2 or the equivalent, and find a job on the business side? . . . Or do I say screw it and go for a complete career change and get skills in something more technical, that would allow me to carry those skills [with me] even if I were to move again to another country?” The portability of a career in development was a major plus for D’Alberti. “That was one of the big reasons. Another consideration was that, looking at the boot camps that were available, the promise was ‘Yeah, we’ll teach you to be a software developer in nine weeks or two months.’ That was a much shorter lead time than getting from JLPT N4 to N2. I definitely wouldn’t be able to do that in two months.” Since D’Alberti had family obligations, the timeline for his career switch was crucial. “We still had family costs and rent and groceries and all of that. I needed to find a job as soon as possible. I actually already at that point had been unsuccessfully job hunting for two months. So that was like, ‘Okay, the savings are winding up, and we are running out of options. I need to make a decision and make it fast.’” How to switch careers Method 1: Software Development Bootcamp Under pressure to find new employment quickly, D’Alberti decided to enter the Le Wagon Coding Bootcamp in Tokyo. Originally, he wavered between Le Wagon and Code Chrysalis, which has since ended its bootcamp programs. “I went with Le Wagon for two reasons,” he explained. “There were some scheduling reasons. . . . But the main reason was that Code Chrysalis required you to pass a coding exam before being admitted to their bootcamp.” Since D’Alberti was struggling to learn development by himself, he knew his chances of passing any coding exam were slim. “I tried Code Academy, I tried Solo Learn, I tried a whole bunch of apps online, I would follow the examples, the exercises . . . nothing clicked. I wouldn’t understand what I was doing or why I was doing it.” At the time, Le Wagon only offered full-time web development courses, although they now also have part-time courses and a data science curriculum. Since D’Alberti was unemployed, a full-time program wasn’t a problem for him, “But it did mean that the people who were present were very particular [kinds] of people: students who could take some time off to add this to their [coursework], or foreigners who took three months off and were traveling and decide to come here and do studying plus sightseeing, and I think there were one or two who actually asked for time off from the job in order to participate.” It was a very intense course, and the experience itself gave me exactly what I needed. I had been trying to learn by myself. It did not work. I did not understand. [After joining], the first day or second day, suddenly everything clicked. D’Alberti appreciated how Le Wagon organized the curriculum to build continuously off previous lessons. By the time he graduated in June of 2019, he’d built three applications from scratch, and felt far more confident in his coding abilities. “It was great. [The curriculum] was amazing, and I really felt super confident in my abilities after the three months. Which, looking back,” he joked, “I still had a lot to learn.” D’Alberti did have some specific advice for those considering a bootcamp: “Especially in the last couple of weeks, it can get very dramatic. You are divided into teams and as a team, you’re supposed to develop an application that you will be demonstrating in front of other people.” Some of the students, D’Alberti explained, felt that pressure intensely; one of his classmates broke down in tears. “Of course,” he added, “one of the big difficulties of joining a bootcamp is economical. The bootcamp itself is quite expensive.” While between 700,000 and 800,000 yen when D’Alberti went through the bootcamp, Le Wagon’s tuition has now risen to 890,000 yen for Web Development and 950,000 for Data Science. At the time D’Alberti joined there was no financial assistance. Now, Le Wagon has an agreement with Hello Work, so that students who are enrolled in the Hello Work system can be reimbursed for up to 70 percent of the bootcamp’s tuition. Though already studying development by himself, Asakura also enrolled in Le Wagon Tokyo in 2024, “to gain structure and accountability,” he said. One lesson that really stayed with me came from Sylvain Pierre, our bootcamp director. He said, ‘You stop being a developer the moment you stop learning or coding.’ That mindset helped me stay on track. Method 2: Online computer science degree Wilson considered going the bootcamp route, but decided against it. He knew, from his experience in recruiting, that a degree would give him an edge—especially in Japan, where having the right degree can make a difference in visa eligibility “The quality of bootcamps is perfectly fine,” he explained. “If you go through a bootcamp and study hard, you can get a job and become a developer no problem. I wanted to differentiate myself on paper as much as I could . . . [because] there are a lot of smart, motivated people who go through a bootcamp.” Whether it’s true or not, whether it’s valid or not, if you take two candidates who are very similar on paper, and one has a coding bootcamp and one has a degree, from a typical Japanese HR perspective, they’re going to lean toward the person with the degree. “Whether that’s good or not, that’s sort of a separate situation,” Wilson added. “But the reality [is] I’m older and I’m trying to make a career change, so I want to make sure that I’m giving myself every advantage that I can.” For these reasons, Wilson opted to get his computer science degree online. “There’s a program out of the University of Oregon, for people who already had a Bachelor’s degree in a different subject to get a Bachelor’s degree in Computer Science. “Because it’s limited to people who already have a Bachelor’s degree, that means you don’t need to take any non-computer science classes. You don’t need any electives or prerequisites or anything like that.” As it happened, Wilson was on paternity leave when he started studying for his degree. “That was one of my motivations to finish quickly!” he said. In the end, with his employer’s cooperation, he extended his paternity leave to two years, and finished the degree in five quarters. Method 3: Self-taught Hendricks took a different route, combining online learning materials with direct experience. He primarily used YouTube tutorials, like this project from one of his favorite channels, to teach himself. Once he had the basics down, he started creating websites for friends, as well as for the publishing company he worked for at the time. With every site, he’d put his name at the bottom of the page, as a form of marketing. This worked well enough that Hendricks was able to quit his work at the translation company and transition to full-time freelancing. However, eventually the freelancing work dried up, and he decided he wanted to experience working at a tech company—and not just for job security reasons. Hendricks saw finding a full-time development role as the perfect opportunity to push himself and see just how far he could get in his new career. There’s a common trope, probably belonging more to the sports world at large, about the importance of shedding ‘blood, sweat, and tears’ in the pursuit of one’s passion . . . and that’s also how I wanted to cut my teeth in the software engineering world. The job hunt While all four are now successfully employed as developers, Asakura, D’Alberti, Wilson, and Hendricks approached and experienced the job hunt differently. Following is their hard-earned advice on best practices and common mistakes. DO network When Hendricks started his job hunt, he faced the disadvantages of not having any formal experience, and also being both physically and socially isolated from other developers. Since he and his family were living in Nagano, he wasn’t able to participate in most of the tech events and meet-ups available in Tokyo or other big cities. His initial job hunt took around a year, and at one point he was sending so many applications that he received a hundred rejections in a week. It wasn’t until he started connecting with the community that he was able to turn it around, eventually getting three good job offers in a single week. Networking, for me, is what made all the difference. It was through networking that I found my mentors, found community, and joined and even started a few great Discord servers. These all undeniably contributed to me ultimately landing my current job, but they also made me feel welcome in the industry. Hendricks particularly credits his mentors, Ean More and Bob Cousins, for giving him great advice. “My initial mentor [Ean More] I actually met through a mutual IT networking Facebook group. I noticed that he was one of the more active members, and that he was always ready to lend a hand to help others with their questions and spread a deeper understanding of programming and computer science. He also often posted snippets of his own code to share with the community and receive feedback, and I was interested in a lot of what he was posting. “I reached out to him and told him I thought it was amazing how selfless he was in the group, and that, while I’m still a junior, if there was ever any grunt work I could do under his guidance, I would be happy to do so. Since he had a history of mentoring others, he offered to do so for me, and we’ve been mentor/mentee and friends ever since.” “My other mentor [Bob Cousins],” Hendricks continued, “was a friend of my late uncle’s. My uncle had originally begun mentoring me shortly before his passing. We were connected through a mutual friend whom I lamented to about not having any clue how to continue following the path my uncle had originally laid before me. He mentioned that he knew just the right person and gave me an email address to contact. I sent an email to the address and was greeted warmly by the man who would become another mentor, and like an uncle to me.” Although Hendricks found him via a personal connection, Cousins runs a mentorship program that caters to a wide variety of industries. Wilson also believes in the power of networking—and not just for the job hunt. “One of the things I like about programming,” he said, “is that it’s a very collaborative community. Everybody wants to help everybody.” We remember that everyone had to start somewhere, and we’ll take time to help those starting out. It’s a very welcoming community. Just do it! We’re all here for you, and if you need help I’ll refer you. Asakura, by contrast, thinks that networking can help, but that it works a little differently in Japan than in other countries. “Don’t rely on it too much,” he said. “Unlike in Western countries, personal referrals don’t always lead directly to job opportunities in Japan. Your skills, effort, and consistency will matter more in the long run.” DO treat the job hunt like a job Once he’d graduated from Le Wagon, D’Alberti said, “I considered job-hunting my full-time job.” I checked all the possible networking events and meetup events that were going on in the city, and tried to attend all of them, every single day. I had a list of 10 different job boards that I would go and just refresh on a daily basis to see, ‘Okay, Is there anything new now?’ And, of course, I talked with recruiters. D’Alberti suggests beginning the search earlier than you think you need to. “I had started actively job hunting even before graduating [from Le Wagon],” he said. “That’s advice I give to everyone who joins the bootcamp. “Two weeks before graduation, you have one simple web application that you can show. You have a second one you’re working on in a team, and you have a third one that you know what it’s going to be about. So, already, there are three applications that you can showcase or you can use to explain your skills. I started going to meetups and to different events, talking with people, showing my CV.” The process wasn’t easy, as most companies and recruiters weren’t interested in hiring for junior roles. But his intensive strategy paid off within a month, as D’Albert landed three invitations to interview: one from a Japanese job board, one from a recruiter, and one from LinkedIn. For Asakura, treating job hunting like a job was as much for his mental health as for his career. “The biggest challenge was dealing with impostor syndrome and feeling like I didn’t belong because I didn’t have a computer science degree,” he explained. “I also experienced burnout from pushing myself too hard.” To cope, I stuck to a structured routine. I went to the gym daily to decompress, kept a consistent study schedule as if I were working full-time, and continued applying for jobs even when it felt hopeless. At first, Asakura tried to apply to jobs strategically by tracking each application, tailoring his resume, and researching every role. “But after dozens of rejections,” he said, “I eventually switched to applying more broadly and sent out over one hundred applications. I also reached out to friends who were already software engineers and asked for direct referrals, but unfortunately, nothing worked out.” Still, Asakura didn’t give up. He practiced interviews in both English and Japanese with his friends, and stayed in touch with recruiters. Most importantly, he kept developing and adding to his portfolio. DO make use of online resources “What ultimately helped me was staying active and visible,” Asakura said. I consistently updated my GitHub, LInkedIn, and Wantedly profiles. Eventually, I received a message on Wantedly from the CTO of a company who was impressed with my portfolio, and that led to my first developer job.” “If you have the time, certifications can also help validate your knowledge,” Asakura added, “especially in fields like cloud and AI. Some people may not realize this, but the rise of artificial intelligence is closely tied to the growth of cloud computing. Earning certifications such as AWS, Kubernetes, and others can give you a strong foundation and open new opportunities, especially as these technologies continue to evolve.” Hendricks also heavily utilized LinkedIn and similar sites, though in a slightly different way. “I would also emphasize the importance of knowing how to use job-hunting sites like Indeed and LinkedIn,” he said. “I had the best luck when I used them primarily to do initial research into companies, then applied directly through the companies’ own websites, rather than through job postings that filter applicants before their resumés ever make it to the actual people looking to hire.” In addition, Hendricks recommends studying coding interview prep tutorials from freeCodeCamp. Along with advice from his mentors and the online communities he joined, he credits those tutorials with helping him successfully receive offers after a long job hunt. DO highlight experience with Japanese culture and language Asakura felt that his experience in Japan, and knowledge of Japanese, gave him an edge. “I understand Japanese work culture [and] can speak the language,” Asakura said, “and as a Japanese national I didn’t require visa sponsorship. That made me a lower-risk hire for companies here.” Hendricks also felt that his excellent Japanese made him a more attractive hire. While applying, he emphasized to companies that he could be a bridge to the global market and business overseas. However, he also admitted this strategy steered him towards applying with more domestic Japanese companies, which were also less likely to hire someone without a computer science degree. “So,” he said, “it sort of washed out.” Wilson is another who put a lot of emphasis on his Japanese language skills, from a slightly different angle. A lot of interviewees typically don’t speak Japanese well . . . and a lot of companies here say that they’re very international, but if they want very good programmers, [those people] spend their lives programming, not studying English. So having somebody who can bridge the language gap on the IT side can be helpful. DO lean into your other experience Several career switchers discovered that their past experiences and skills, while not immediately relevant to their new career, still proved quite helpful in landing that first role—sometimes in very unexpected ways. When Wilson was pitching his language skills to companies, he wasn’t talking about just Japanese–English translation. He also highlighted his prior experience in sales to suggest that he could help communicate with and educate non-technical audiences. “Actually to be a software engineer, there’s a lot of technical communication you have to do.” I have worked with some incredible coders who are so good at the technical side and just don’t want to do the personal side. But for those of us who are not super-geniuses and can’t rely purely on our tech skills . . . there’s a lot of non-technical discussion that goes around building a product.” This strategy, while eventually fruitful, didn’t earn Wilson a job right away. Initially, he applied to more than sixty companies over the course of three to four months. “I didn’t have any professional [coding] experience, so it was actually quite a rough time,” he said. “I interviewed all over the place. I was getting rejected all over town.” The good news was, Wilson said, “I’m from Chicago. I don’t know what it is, but there are a lot of Chicagoans who work in Tokyo for whatever reason.” When he finally landed an interview, one of the three founders of the company was also from Chicago, giving them something in common. “We hit it off really well in the interview. I think that kind of gave me the edge to get the role, to be honest.” Like Wilson, D’Alberti found that his previous work as a marketer helped him secure his first developer role—which was ironic, he felt, given that he’d partially chosen to switch careers because he hadn’t been able to find an English-language marketing job in Japan. “I had my first interview with the CEO,” he told me, “and this was for a Japanese startup that was building chatbots, and they wanted to expand into the English market. So I talked with the CEO, and he was very excited to get to know me and sent me to talk with the CTO.” The CTO, unfortunately, wasn’t interested in hiring a junior developer with no professional experience. “And I thought that was the end of it. But then I got called again by the CEO. I wanted to join for the engineering position, and he wanted to have me for my marketing experience.” In the end we agreed that I would join in a 50-50 arrangement. I would do 50 percent of my job in marketing and going to conferences and talking to people, and 50 percent on the engineering side. I was like, ‘Okay, I’ll take that.’ This ended up working better than D’Alberti had expected, partially due to external circumstances. “When COVID came, we couldn’t travel abroad, so most of the job I was doing in my marketing role I couldn’t perform anymore. “So they sat me down and [said], ‘What are we going to do with you, since we cannot use you for marketing anymore?’ And I was like, ‘Well, I’m still a software developer. I could continue working in that role.’ And that actually allowed me to fully transition.” DON’T make these mistakes It was D’Alberti’s willingness to compromise on that first development role that led to his later success, so he would explicitly encourage other career-changers to avoid, in his own words, “being too picky.” This advice is based, not just on his own experience, but also on his time working as a teaching assistant at Le Wagon. “There were a couple of people who would be like, ‘Yeah, I’d really like to find a job and I’m not getting any interviews,’” he explained. “And then we’d go and ask, ‘Okay, how many companies are you applying to? What are you doing?’ But [they’d say] ‘No, see, [this company] doesn’t offer enough’ or ‘I don’t really like this company’ or ‘I’d like to do something else.’ Those who would be really picky or wouldn’t put in the effort, they wouldn’t land a job. Those who were deadly serious about ‘I need to get a job as a software developer,’ they’d find one. It might not be a great job, it might not be at a good company, but it would be a good first start from which to move on afterwards. Asakura also knew some other bootcamp graduates who struggled to find work. “A major reason was a lack of Japanese language skills,” he said. Even for junior roles, many companies in Japan require at least conversational Japanese, especially domestic ones. On the other hand, if you prioritize learning Japanese, that can give you an edge on entering the industry: “Many local companies are open to training junior developers, as long as they see your motivation and you can communicate effectively. International companies, on the other hand, often have stricter technical requirements and may pass on candidates without degrees or prior experience.” Finally, Hendricks said that during his own job hunt, “Not living in Tokyo was a problem.” It was something that he was able to overcome via diligent digital networking, but he’d encourage career-changers to think seriously about their future job prospects before settling outside a major metropolis in Japan. Their top advice I asked each developer to share their number one piece of advice for career-changers. D’Alberti wasn’t quite sure what to suggest, given recent changes in the tech market overall. “I don’t have clear advice to someone who’s trying to break into tech right now,” he said. “It might be good to wait and see what happens with the AI path. Might be good to actually learn how to code using AI, if that’s going to be the way to distinguish yourself from other junior developers. It might be to just abandon the idea of [being] a linear software developer in the traditional sense, and maybe look more into data science, if there are more opportunities.” But assuming they still decide ‘Yes, I want to join, I love the idea of being a software developer and I want to go forward’ . . . my main suggestion is patience. “It’s going to be tough,” he added. By contrast, Hendricks and Wilson had the same suggestion: if you want to change careers, then go for it, full speed ahead. “Do it now, or as soon as you possibly can,” Hendricks stated adamantly. His life has been so positively altered by discovering and pursuing his passion, that his only regret is he didn’t do it sooner. Wilson said something strikingly similar. “Do it. Just do it. I went back and forth a lot,” he explained. “‘Oh, should I do this, it’s so much money, I already have a job’ . . . just rip the bandaid off. Just do it. You probably have a good reason.” He pointed out that while starting over and looking for work is scary, it’s also possible that you’ll lose your current job anyway, at which point you’ll still be job hunting but in an industry you no longer even enjoy. “If you keep at it,” he said, “you can probably do it.” “Not to talk down to developers,” he added, “but it’s not the hardest job in the world. You have to study and learn and be the kind of person who wants to sit at the computer and write code, but if you’re thinking about it, you’re probably the kind of person who can do it, and that also means you can probably weather the awful six months of job hunting.” You only need to pass one job interview. You only need to get your foot in the door. Asakura agreed with “just do it,” but with a twist. “Build in public,” he suggested. “Share your progress. Post on GitHub. Keep your LinkedIn active.” Let people see your journey, because even small wins build momentum and credibility. “To anyone learning to code right now,” Asakura added, “don’t get discouraged by setbacks or rejections. Focus on building, learning, and showing up every day. Your portfolio speaks louder than your past, and consistency will eventually open the door.” If you want to read more how-tos and success stories around networking, working with recruitment agencies, writing your resume, etc., check out TokyoDev’s other articles. If you’d like to hear more about being a developer in Japan, we invite you to join the TokyoDev Discord, which has over 6,000 members as well as dedicated channels for resume review, job posts, life in Japan, and more.

3 days ago • 12 votes

New here?