Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]

Home on Erik Bernhardsson

Home on Erik...
Waiting time, load factor, and queueing theory: why you need to cut your systems a bit of slack I've been reading up on operations research lately, including queueing theory. It started out as a...
over a year ago
18
over a year ago
I've been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup) but it's turned into my little hammer and now I see nails everywhere.
Home on Erik...
The number of letters in the word for each number Just for fun, I generated these graphs of the number of letters in the word for each number. I...
over a year ago
13
over a year ago
Just for fun, I generated these graphs of the number of letters in the word for each number. I really spent about 10 minutes on this (ok…possibly also another 40 minutes tweaking the plots): More languages!
Home on Erik...
Benchmarking nearest neighbor libraries in Python Radim Rehurek has put together an excellent summary of approximate nearest neighbor libraries in...
over a year ago
13
over a year ago
Radim Rehurek has put together an excellent summary of approximate nearest neighbor libraries in Python. This is exciting, because one of the libraries he's covering, annoy, was built by me. After introducing the problem, he goes through the list of contestants and sticks with...
Home on Erik...
Better precision and faster index building in Annoy Sometimes you have these awesome insights. A few days ago I got an idea for how to improve index...
over a year ago
12
over a year ago
Sometimes you have these awesome insights. A few days ago I got an idea for how to improve index building in Annoy. For anyone who isn't acquainted with Annoy – it's a C++ library with Python bindings that provides fast high-dimensional nearest neighbor search.
Home on Erik...
What can startups learn from Koch Industries? I recently finished the excellent book Kochland. This isn't my first interest in Koch—I read The...
over a year ago
12
over a year ago
I recently finished the excellent book Kochland. This isn't my first interest in Koch—I read The Science of Success by Charles Koch himself a couple of years ago. Charles Koch inherited a tiny company in 1967 and turned it into one of the world's largest ones.
Home on Erik...
Nearest neighbors and vector models – epilogue – curse of dimensionality This is another post based on my talk at NYC Machine Learning. The previous two parts covered most...
over a year ago
12
over a year ago
This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out
Home on Erik...
Plotting author statistics for Git repos using Git of Theseus I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I...
over a year ago
12
over a year ago
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates...
Home on Erik...
Miscellaneous unsolicited (and possibly biased) career advice No one asked for this, but I'm something like ~12 years into my career and have had my fair share of...
over a year ago
11
over a year ago
No one asked for this, but I'm something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I'd share some. Honestly, I feel like I've mostly benefitted from luck.
Home on Erik...
It's hard to write code for computers, but it's even harder to write code for humans Writing code for a computer is hard enough. You take something big and fuzzy, some large vague...
4 months ago
11
4 months ago
Writing code for a computer is hard enough. You take something big and fuzzy, some large vague business outcome you want to achive. Then you break it down recursively and think about all the cases until you have clear logical statements a computer can follow.
Home on Erik...
Predicting solar eclipses with Python As I am en route to see my first total solar eclipse, I was curious how hard it would be to compute...
10 months ago
11
10 months ago
As I am en route to see my first total solar eclipse, I was curious how hard it would be to compute eclipses in Python. It turns out, ignoring some minor coordinate system head-banging, I was able to get something half-decent working in a couple of hours.
Home on Erik...
We are still early with the cloud: why software development is overdue for a change This is is in many respects a successor to a blog post I wrote last year about what I want from...
over a year ago
11
over a year ago
This is is in many respects a successor to a blog post I wrote last year about what I want from software infrastructure, but the ideas morphed in my head into something sort of wider.
Home on Erik...
What is the right level of specialization? For data teams and anyone else. This isn't as much of a blog post as an elaboration of a tweet I posted the other day: I think this...
over a year ago
11
over a year ago
This isn't as much of a blog post as an elaboration of a tweet I posted the other day: I think this specialization of data teams into 99 different roles (data scientist, data engineer, analytics engineer, ML engineer etc) is generally a bad thing driven by the fact that tools are...
Home on Erik...
Developer experience as a competitive advantage I spent a ton of time looking at different software providers, both as a CTO, and as a nerd...
over a year ago
11
over a year ago
I spent a ton of time looking at different software providers, both as a CTO, and as a nerd “advanced” consumer who builds stuff in my spare time. In the last 10 years, there has been an order of magnitude more products that cater directly to developers, through APIs, SDKs, and...
Home on Erik...
Language pitch Here's a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain...
over a year ago
11
over a year ago
Here's a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial...
Home on Erik...
Annoying blog post I spent a couple of hours this weekend going through some pull requests and issues to Annoy, which...
over a year ago
11
over a year ago
I spent a couple of hours this weekend going through some pull requests and issues to Annoy, which is an open source C++/Python library for Approximate Nearest Neighbor search. I set up Travis-CI integration and spent some time on one of the issues that multiple people had...
Home on Erik...
Pareto efficency Pareto efficiency is a useful concept I like to think about. It often comes up when you compare...
over a year ago
11
over a year ago
Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let's assume you only care about two factors: price and quality.
Home on Erik...
Functional programming is the libertarianism of software engineering This is a pretty dumb post, in which I argue that functional programming has a lot of the bad parts...
over a year ago
11
over a year ago
This is a pretty dumb post, in which I argue that functional programming has a lot of the bad parts of libertarianism and a lot of the good parts: Both ideologies strive to eliminate [the] state.
Home on Erik...
The software engineering rule of 3 Here's a dumb extremely accurate rule I'm postulating* for software engineering projects: *you need...
over a year ago
11
over a year ago
Here's a dumb extremely accurate rule I'm postulating* for software engineering projects: *you need at least 3 examples before you solve the right problem*. This is what I've noticed: Don't factor out shared code between two classes.
Home on Erik...
Tumblr's awesome project names Not sure how I managed to miss this, but I'm watching this Tumblr presentation and they talk about...
over a year ago
11
over a year ago
Not sure how I managed to miss this, but I'm watching this Tumblr presentation and they talk about their projects named after Arrested Development topics: Gob, Parmesan, Buster, Jetpants, Oscar, George and Motherboy. Still, the best software project name is probably still Apple's...
Home on Erik...
Why software projects take longer than you think: a statistical model Anyone who built software for a while knows that estimating how long something is going to take is...
over a year ago
10
over a year ago
Anyone who built software for a while knows that estimating how long something is going to take is hard. It's hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something.
Home on Erik...
Nearest neighbors and vector models – part 2 – algorithms and data structures This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a...
over a year ago
10
over a year ago
This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces.
Home on Erik...
Deep learning for… Go This is the last post about deep learning for chess/go/whatever. But this really cool paper by...
over a year ago
10
over a year ago
This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It's about using convolutional neural networks to play Go.
Home on Erik...
Simple sabotage for software CIA produced a fantastic book during the peak of World War 2 called Simple Sabotage. It laid out...
a year ago
10
a year ago
CIA produced a fantastic book during the peak of World War 2 called Simple Sabotage. It laid out various ways for infiltrators to ruin productivity of a company. Some of the advice is timeless, for instance the section about “General interference with Organizations and...
Home on Erik...
I'm looking for data engineers I'm interrupting the regular programming for a quick announcement: we're looking for data engineers...
over a year ago
10
over a year ago
I'm interrupting the regular programming for a quick announcement: we're looking for data engineers at Better. You would be the first one to join and would work a lot directly with me. Some fun things you could work on (these are all projects I'm working on right now):
Home on Erik...
Optimizing for iteration speed I've written before about the importance of iterating quickly but I didn't necessarily talk about...
over a year ago
10
over a year ago
I've written before about the importance of iterating quickly but I didn't necessarily talk about some concrete things you can do. When I've built up the tech team at Better, I've intentionally optimized for fast iteration speed above almost everything else.
Home on Erik...
Benchmark of Approximate Nearest Neighbor libraries Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you...
over a year ago
10
over a year ago
Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point.
Home on Erik...
What's up with music recommendations? I just answered a Quora question about what, if any, are the differences in the algorithms that are...
over a year ago
10
over a year ago
I just answered a Quora question about what, if any, are the differences in the algorithms that are behind recommendations for music and movies. Of course, every media type is different. For instance, there's fundamental reasons why latent factor models works really well for...
Home on Erik...
How to build up a data team (everything I ever learned about recruiting) During my time at Spotify, I've reviewed thousands of resumes and interviewed hundreds of people....
over a year ago
10
over a year ago
During my time at Spotify, I've reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I've also had my share of offers rejected by the candidate.
Home on Erik...
statself.com Btw I just put something up online that I spent a couple of evenings in my couch putting together:...
over a year ago
10
over a year ago
Btw I just put something up online that I spent a couple of evenings in my couch putting together: it's a website where you can track any numerical data on the web. Want to know how many Twitter followers you have?
Home on Erik...
Annoy Annoy is a simple package to find approximate nearest neighbors (ANN) that I just put on Github. I'm...
over a year ago
10
over a year ago
Annoy is a simple package to find approximate nearest neighbors (ANN) that I just put on Github. I'm not trying to compete with existing packages, but Annoy has a couple of features that makes it pretty useful.
Home on Erik...
More recommender algorithms I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization...
over a year ago
10
over a year ago
I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization algorithm we have used for a while assumes that we have user vectors $$ bf{a}_u $$ and item vectors $$ bf{b}_i $$ .
Home on Erik...
Data architecture vs backend architecture A modern tech stack typically involves at least a frontend and backend but relatively quickly also...
over a year ago
10
over a year ago
A modern tech stack typically involves at least a frontend and backend but relatively quickly also grows to include a data platform. This typically grows out of the need for ad-hoc analysis and reporting but possibly evolves into a whole oil refinery of cronjobs, dashboards, bulk...
Home on Erik...
Deep learning for... chess I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some...
over a year ago
10
over a year ago
I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some point. So why not combine the two? That's what I thought, and I ended up spending way too much time on it.
Home on Erik...
NYC subway math Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the...
over a year ago
10
over a year ago
Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here's some relevant code for how to use the API:
Home on Erik...
Ping the world I just pinged a few million random IP addresses from my apartment in NYC. Here's the result: Some...
over a year ago
10
over a year ago
I just pinged a few million random IP addresses from my apartment in NYC. Here's the result: Some notes: What's going on with Sweden? Too much torrenting? Ireland is likewise super slow, but not Northern Ireland Eastern Ukraine is also super slow, maybe not surprising given...
Home on Erik...
It's called Berkson's paradox! As noted by multiple tweets, my previous post describes a phenomenon denoted Berkson's...
over a year ago
10
over a year ago
As noted by multiple tweets, my previous post describes a phenomenon denoted Berkson's paradox. Here's another example: Why Are Handsome Men Such Jerks?
Home on Erik...
Exploding offers are bullshit I do a lot of recruiting and have given maybe 50 offers in my career. Although many companies do, I...
over a year ago
10
over a year ago
I do a lot of recruiting and have given maybe 50 offers in my career. Although many companies do, I never put a deadline on any of them. Unfortunately, I've often ended up competing with other companies who do, and I feel really bad that this usually tricks younger developers...
Home on Erik...
Norvig's claim that programming competitions correlate negatively with being good on the job I saw a bunch of tweets over the weekend about Peter Norvig claiming there's a negative correlation...
over a year ago
10
over a year ago
I saw a bunch of tweets over the weekend about Peter Norvig claiming there's a negative correlation between being good at programming competitions and being good at the job. There were some decent Hacker News comments on it.
Home on Erik...
Vote for our SXSW panel! If you have a few minutes, you should check out mine and Chris Johnson‘s panel proposal. Go here and...
over a year ago
10
over a year ago
If you have a few minutes, you should check out mine and Chris Johnson‘s panel proposal. Go here and vote: http://panelpicker.sxsw.com/vote/24504 Algorithmic Music Discovery at Spotify ****Spotify crunches hundreds of billions of streams to analyze user's music taste and provide...
Home on Erik...
The eigenvector of "Why we moved from language X to language Y" I was reading yet another blog post titled “Why our team moved from <language X> to <language Y>” (I...
over a year ago
10
over a year ago
I was reading yet another blog post titled “Why our team moved from <language X> to <language Y>” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y?
Home on Erik...
Antipodes I was playing around with D3 last night and built a silly visualization of antipodes and how our...
over a year ago
10
over a year ago
I was playing around with D3 last night and built a silly visualization of antipodes and how our intuitive understanding of the world sometimes doesn't make sense. Check out the visualization at bl.ocks.org! Basically the idea is if you fly from Beijing to Buenos Aires then you...
Home on Erik...
Installing TensorFlow on AWS Curious about Google's newly released TensorFlow? I don't have a beefy GPU machine, so I spent some...
over a year ago
10
over a year ago
Curious about Google's newly released TensorFlow? I don't have a beefy GPU machine, so I spent some time getting it to run on EC2. The steps on how to reproduce it are pretty brutal and I wouldn't recommend going through it unless you want to waste five hours of your live.
Home on Erik...
Model benchmarks A lot of people have asked me what models we use for recommendations at Spotify so I wanted to share...
over a year ago
10
over a year ago
A lot of people have asked me what models we use for recommendations at Spotify so I wanted to share some insights. Here's benchmarks for some models. Note that we don't use all of them in production.
Home on Erik...
Luigi success So Luigi, our open sourced workflow engine in Python, just recently passed 1,000 stars on Github,...
over a year ago
10
over a year ago
So Luigi, our open sourced workflow engine in Python, just recently passed 1,000 stars on Github, then shortly after passed mrjob as (I think) the most popular Python package to do Hadoop stuff. This is exciting!
Home on Erik...
More Luigi alternatives The workflow engine battle has intensified with some more interesting entries lately! Here are a...
over a year ago
10
over a year ago
The workflow engine battle has intensified with some more interesting entries lately! Here are a couple I encountered in the last few days. I love that at least two of them are direct references to Luigi!
Home on Erik...
Being data driven I picked up an issue of Foreign Affairs while flying back to NYC from SFO. It features this long...
over a year ago
9
over a year ago
I picked up an issue of Foreign Affairs while flying back to NYC from SFO. It features this long interview with U.S. General Stanley McChrystal and I thought it was pretty interesting how striking some of the similarities are between fighting in a war and developing software.
Home on Erik...
Dollar cost averaging (I accidentally published an unfinished draft of this post a few days ago – sorry about...
over a year ago
9
over a year ago
(I accidentally published an unfinished draft of this post a few days ago – sorry about that). There's a lot of sources preaching the benefits of dollar cost averaging, or the practice of investing a fixed amount of money regularly.
Home on Erik...
MLConf 2014 Just spent a day at MLConf where I was talking about how we do music recommendations. There was a...
over a year ago
9
over a year ago
Just spent a day at MLConf where I was talking about how we do music recommendations. There was a whole range of great speakers (actually almost 2/3 women which was pretty cool in itself). Here are my slides:
Home on Erik...
Welcome Echo Nest! In case you missed it, we just acquired a company called Echo Nest in Boston. These people have been...
over a year ago
9
over a year ago
In case you missed it, we just acquired a company called Echo Nest in Boston. These people have been obsessed with understanding music for the past 8 years since it was founded by Brian Whitman and Tristan Jehan out of MIT Medialab.
Home on Erik...
hdfs2cass Just open sourced hdfs2cass which is a Hadoop job (written in Java) to do efficient Cassandra...
over a year ago
9
over a year ago
Just open sourced hdfs2cass which is a Hadoop job (written in Java) to do efficient Cassandra bulkloading. The nice thing is that it queries Cassandra for its topology and uses that to partition the data so that each reducer can upload data directly to a Cassandra node.
Home on Erik...
Fermat's principle I was browsing around on the Internet and the physics geek in me started reading about Fermat's...
over a year ago
9
over a year ago
I was browsing around on the Internet and the physics geek in me started reading about Fermat's principle. And suddenly something came back to me that I've been trying to suppress for many years – how I never understood why there's anything fundamental about the principal of...
Home on Erik...
Why conversion matters: a toy model There are often close relationships between top level business metrics. For instance, it's well...
over a year ago
9
over a year ago
There are often close relationships between top level business metrics. For instance, it's well known that retention has a super strong impact on the valuation of a subscription business. Or that the % of occupied seats is super important for an airline.
Home on Erik...
Pinterest open sources Pinball Pinterest just open sourced Pinball which seems like an interesting Luigi alternative. There's two...
over a year ago
9
over a year ago
Pinterest just open sourced Pinball which seems like an interesting Luigi alternative. There's two blog posts: Pinball: Building workflow management (from 2014) and Open-sourcing Pinball (from this week). The author has a comment in the comments thread on Hacker News:
Home on Erik...
3D Andy Sloane decided to call my 2D visualization and raise it to 3D. (Looks a little weird in the...
over a year ago
9
over a year ago
Andy Sloane decided to call my 2D visualization and raise it to 3D. (Looks a little weird in the iframe but check out the link). It's based on a LDA model with 200 topics, so the artists tend to stick to clusters where each cluster is a topic.
Home on Erik...
Everything I learned about technical debt I just made it to Sweden suffering from jet lag induced insomnia, but this blog post will not cover...
over a year ago
9
over a year ago
I just made it to Sweden suffering from jet lag induced insomnia, but this blog post will not cover that. Instead, I will talk a little bit about technical debt. The concept of technical debt always resonated with me, partly because I always like the analogy with “real” debt.
Home on Erik...
Interview with a Data Scientist: Erik Bernhardsson I was featured in Peadar Coyle's interview series interviewing various “data scientists” – which is...
over a year ago
9
over a year ago
I was featured in Peadar Coyle's interview series interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I'm not really a data scientist.
Home on Erik...
Presentation about Luigi I like the editing!
over a year ago
Home on Erik...
NoDoc We had an unconference at Spotify last Thursday and I added a semi-trolling semi-serious topic about...
over a year ago
9
over a year ago
We had an unconference at Spotify last Thursday and I added a semi-trolling semi-serious topic about abolishing documentation. Or NoDoc, as I'm going to call this movement. This was meant to be mostly a thought experiment, but I don't see it as complete madness.
Home on Erik...
Open source
over a year ago
Home on Erik...
MCMC for marketing data The other day I was looking at marketing spend broken down by channel and wanted to compute some...
over a year ago
9
over a year ago
The other day I was looking at marketing spend broken down by channel and wanted to compute some simple uncertainty estimates. I have data like this: <th> Total spend </th> <th> Transactions </th> Channel A <td> 2292.
Home on Erik...
Implicit data and collaborative filtering A lot of people these days know about collaborative filtering. It's that Netflix Prize thing, right?...
over a year ago
9
over a year ago
A lot of people these days know about collaborative filtering. It's that Netflix Prize thing, right? People rate things 1-5 stars and then you have to predict missing ratings. While there's no doubt that the Netflix Prize was successful, I think it created an illusion that all...
Home on Erik...
Snakebite Just promoting Spotify stuff here: check out the Snakebite repo on Github, written by Wouter de Bie....
over a year ago
9
over a year ago
Just promoting Spotify stuff here: check out the Snakebite repo on Github, written by Wouter de Bie. It's a super fast tool to access HDFS over CLI/Python, by accessing the namenode directly over sockets/protobuf. Spotify's developer blog features a nice blog post outlining what...
Home on Erik...
My issue with GPU-accelerated deep learning I've been spending several hundred bucks renting GPU instances on AWS over the last year. The...
over a year ago
9
over a year ago
I've been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there's some research on using FPGA (I know very little...
Home on Erik...
The relationship between commit size and commit message size Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here's...
over a year ago
9
over a year ago
Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here's my rationale: When I update one line of code I feel like I have to put in a long explanation about its side effects, why it's fully backwards compatible, and why it fixes some...
Home on Erik...
What I have been working on: Modal Long story short: I'm working on a super cool tool called Modal. Please check it out — it lets you...
over a year ago
9
over a year ago
Long story short: I'm working on a super cool tool called Modal. Please check it out — it lets you run things in the cloud without having to think about infrastructure. Scaling out, scheduling, containerization, using GPUs, setting up webhooks, and all kinds of other stuff.
Home on Erik...
The power of ensembles From my presentation at MLConf, one of the points I think is worth stressing again is how extremely...
over a year ago
9
over a year ago
From my presentation at MLConf, one of the points I think is worth stressing again is how extremely well combining different algorithms works. In this case, we're training machine learning algorithms on different data sets (playlists, play counts, sessions) and different...
Home on Erik...
Are data sets the new server rooms? This blog post Data sets are the new server rooms makes the point that a bunch of companies raise a...
over a year ago
9
over a year ago
This blog post Data sets are the new server rooms makes the point that a bunch of companies raise a ton of money to go get really proprietary awesome data as a competitive moat. Because once you have the data, you can build a better product, and no one can copy it (at least not...
Home on Erik...
Home
over a year ago
Home on Erik...
3D in D3 I have spent some time lately with D3. It's a lot of fun to build interactive graphs. See for...
over a year ago
9
over a year ago
I have spent some time lately with D3. It's a lot of fun to build interactive graphs. See for instance this demo (will provide a longer writeup soon). D3 doesn't have support for 3D but you can do projections into 2D pretty easily.
Home on Erik...
I believe in the 10x engineer, but... The easiest way to be a 10x engineer is to make 10 other engineers 2x more efficient. Someone can be...
over a year ago
9
over a year ago
The easiest way to be a 10x engineer is to make 10 other engineers 2x more efficient. Someone can be a 10x engineer if they do nothing for 364 days then convinces the team to change programming language to a 2x more productive language.
Home on Erik...
More Luigi! Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one...
over a year ago
9
over a year ago
Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one I put together a few weeks ago. In case anyone is interested I'll include it too:
Home on Erik...
I already found the best gifs Just search for “hackers gif“. There you go. Fun for your work emails for the next 500 years. From...
over a year ago
9
over a year ago
Just search for “hackers gif“. There you go. Fun for your work emails for the next 500 years. From the awesome movie Hackers. That movie together with The Warriors convinced me that I wanted to live in NYC when I was like… 14 years old.
Home on Erik...
Scala Data Pipelines for Music Recommendations Chris Johnson‘s presentation from Data Day Texas:
over a year ago
Home on Erik...
σ-driven project management: when is the optimal time to give up? Hi! It's your friendly project management theorician. You might remember me from blog posts such as...
over a year ago
9
over a year ago
Hi! It's your friendly project management theorician. You might remember me from blog posts such as Why software projects take longer than you think, which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution.
Home on Erik...
In defense of false positives (why you can't fail with A/B tests) Many years ago, I used to think that A/B tests were foolproof and all you need to do is compare the...
over a year ago
9
over a year ago
Many years ago, I used to think that A/B tests were foolproof and all you need to do is compare the metrics for the two groups. The group with the highest conversion rate wins, right?
Home on Erik...
Approximate nearest news As you may know, one of my (very geeky) interests is Approximate nearest neigbor methods, and I'm...
over a year ago
9
over a year ago
As you may know, one of my (very geeky) interests is Approximate nearest neigbor methods, and I'm the author of a Python package called Annoy. I've also built a benchmark suite called ann-benchmarks to compare different packages.
Home on Erik...
Presentations about Spotify music recommendations A couple of people in my old team have been around talking about how Spotify does music...
over a year ago
9
over a year ago
A couple of people in my old team have been around talking about how Spotify does music recommendations and put together some quite good presentations. First one is Neville Li's presentation about Scala Data Pipelines @ Spotify:
Home on Erik...
Running Theano on EC2 Inspired by Sander Dieleman's internship at Spotify, I've been playing around with deep learning...
over a year ago
9
over a year ago
Inspired by Sander Dieleman's internship at Spotify, I've been playing around with deep learning using Theano. Theano is this Python package that lets you define symbolic expressions (cool), does automatic differentiation (really cool), and compiles it down into bytecode to run...
Home on Erik...
2D embedding of 5k artists = WIN I'm at KDD in Chicago for a few days. We have a Spotify booth tomorrow, and I wanted to put together...
over a year ago
9
over a year ago
I'm at KDD in Chicago for a few days. We have a Spotify booth tomorrow, and I wanted to put together some cool graphics to show. I've been thinking about doing a 2D embedding of the top artists forever since I read about t-SNE and other papers so this was a perfect opportunity to...
Home on Erik...
Luigi: complex pipelines of tasks in Python I'm shamelessly promoting my first major open source project. Luigi is a Python module that helps...
over a year ago
9
over a year ago
I'm shamelessly promoting my first major open source project. Luigi is a Python module that helps you build complex pipelines of batch jobs, handle dependency resolution, and create visualizations to help manage multiple workflows. It also comes with Hadoop support built in...
Home on Erik...
Ratio metrics We run a ton of A/B tests at Spotify and we look at a ton of metrics. Defining metrics is a little...
over a year ago
9
over a year ago
We run a ton of A/B tests at Spotify and we look at a ton of metrics. Defining metrics is a little bit of an art form. Ideally you want to define success metrics before you run a test to avoid cherry picking metrics.
Home on Erik...
Learning from users faster using machine learning I had an interesting idea a few weeks ago, best explained through an example. Let's say you're...
over a year ago
8
over a year ago
I had an interesting idea a few weeks ago, best explained through an example. Let's say you're running an e-commerce site (I kind of do) and you want to optimize the number of purchases. Let's also say we try to learn as much as we can from users, both using A/B tests but also...
Home on Erik...
Meta-blogging (This is not a very relevant/useful post for regular readers – feel free to skip. I thought I would...
over a year ago
8
over a year ago
(This is not a very relevant/useful post for regular readers – feel free to skip. I thought I would share it so people can find it on Google.) My blog blew up twice in a week earlier this year when I landed on Hacker News.
Home on Erik...
About
over a year ago
Home on Erik...
My favorite management failures For most people straight out of school, work life is a bit of a culture shock. For me it was an...
over a year ago
8
over a year ago
For most people straight out of school, work life is a bit of a culture shock. For me it was an awesome experience, but a lot of the constraints were different and I had to learn to optimize for different things.
Home on Erik...
The Filter Bubble is Silly and you Can't Guess What Happened Next I'm at RecSys 2014, meeting a lot of people and hanging out at talks. Some of the discussions here...
over a year ago
8
over a year ago
I'm at RecSys 2014, meeting a lot of people and hanging out at talks. Some of the discussions here was about the filter bubble which prompted me to formalize my own thoughts. I firmly believe that it's the role of a system to respect the user's intent.
Home on Erik...
The half-life of code & the ship of Theseus As a project evolves, does the new code just add on top of the old code? Or does it replace the old...
over a year ago
8
over a year ago
As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project.
Home on Erik...
More Luigi: Presentation from OSCON I was in Portland, OR for a few days hanging out at OSCON. Was fun. I also talked a bit about...
over a year ago
8
over a year ago
I was in Portland, OR for a few days hanging out at OSCON. Was fun. I also talked a bit about Luigi: Next week I'm presenting at the NYC Predictive Analytics meetup together with Blake Shaw from Foursquare.
Home on Erik...
New approximate nearest neighbor benchmarks As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I'm...
over a year ago
8
over a year ago
As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I'm the author of Annoy, a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data...
Home on Erik...
Stuff that bothers me: &#8220;100x faster than Hadoop&#8221; The simple way to get featured on big data blog these days seem to be Build something that does 1...
over a year ago
8
over a year ago
The simple way to get featured on big data blog these days seem to be Build something that does 1 thing super well but nothing else Benchmark it against Hadoop Publish stats showing that it's 100x faster than Hadoop $$$ Spark claims their 100x faster than Hadoop and there's a...
Home on Erik...
There is no magic trick (Warning: super speculative, feel free to ignore) As Yogi Berra said, “It's tough to make...
over a year ago
8
over a year ago
(Warning: super speculative, feel free to ignore) As Yogi Berra said, “It's tough to make predictions, especially about the future”. Unfortunately predicting is hard, and unsurprisingly people look for the Magic Trick™ that can resolve all the uncertainty.
Home on Erik...
Luigi talk tomorrow At NYC Data Science meetup! Unfortunately the space is full but the talk will be livestreamed –...
over a year ago
8
over a year ago
At NYC Data Science meetup! Unfortunately the space is full but the talk will be livestreamed – check out the meetup web page for a link tomorrow.
Home on Erik...
Nearest neighbor methods and vector models – part 1 This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a...
over a year ago
8
over a year ago
This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces.
Home on Erik...
Slides from NYC Machine Learning talk Slides from the talk. Slightly edited because (a) some of the slides make little sense taken out of...
over a year ago
8
over a year ago
Slides from the talk. Slightly edited because (a) some of the slides make little sense taken out of context (b) Slideshare seem to have problem converting some of the stuff. Collaborative filtering at Spotify from Erik Bernhardsson
Home on Erik...
Modeling conversion rates using Weibull and gamma distributions This is a blog post originally featured on the Better engineering blog. If you want to link to this...
over a year ago
8
over a year ago
This is a blog post originally featured on the Better engineering blog. If you want to link to this article or share it, please go to the original post URL! Separately, I'm sorry it's been so long with no posts on this blog.
Home on Erik...
New benchmarks for approximate nearest neighbors UPDATE(2018-06-17): There are is a later blog post with newer benchmarks! One of my super nerdy...
over a year ago
8
over a year ago
UPDATE(2018-06-17): There are is a later blog post with newer benchmarks! One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space.
Home on Erik...
Blogroll Remember when everyone had a really ugly blog with a blogroll? Anyway, just think the word is...
over a year ago
8
over a year ago
Remember when everyone had a really ugly blog with a blogroll? Anyway, just think the word is funny. I follow a few hundred blogs using Feedly and Reeder and have been reading a few hundred thousand blog posts over the last 10 years.
Home on Erik...
I don't want to learn your garbage query language This is a bit of a rant but I really don't like software that invents its own query language....
over a year ago
8
over a year ago
This is a bit of a rant but I really don't like software that invents its own query language. There's a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random...
Home on Erik...
A brief history of Hadoop at Spotify I was talking with some data engineers at Spotify and had a moment of nostalgia. 2008 I was writing...
over a year ago
8
over a year ago
I was talking with some data engineers at Spotify and had a moment of nostalgia. 2008 I was writing my master's thesis at Spotify and had to run a Hadoop job to extract some data from the logs.
Home on Erik...
Black Box Machine Learning in the Cloud There's a bunch of companies working on machine learning as a service. Some old companies like...
over a year ago
8
over a year ago
There's a bunch of companies working on machine learning as a service. Some old companies like Google, but now also Amazon and Microsoft. Then there's a ton of startups: PredictionIO ($2.7M funding), BigML ($1.6M funding), Clarifai, etc, etc.
Home on Erik...
How to hire smarter than the market: a toy model Let's consider a toy model where you're hiring for two things and that those are equally valuable....
over a year ago
8
over a year ago
Let's consider a toy model where you're hiring for two things and that those are equally valuable. It's not very important what those are, so let's just call them “thing A” and “thing B” for now.
Home on Erik...
Some more font links My blog post about fonts generated lots of traffic – it landed on Hacker News, took down my site...
over a year ago
8
over a year ago
My blog post about fonts generated lots of traffic – it landed on Hacker News, took down my site while I was sleeping, and then obviously vanished from HN before I woke up. But it also got retweeted by a ton of people.
Home on Erik...
When machine learning matters I joined Spotify in 2008 to focus on machine learning and music recommendations. It's easy to...
over a year ago
8
over a year ago
I joined Spotify in 2008 to focus on machine learning and music recommendations. It's easy to forget, but Spotify's key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive.
Home on Erik...
Google diversity memo, global warming, Pascal's wager, and other stuff There's about 765 million blog posts about the diversity “memo” that leaked out of Google a couple...
over a year ago
8
over a year ago
There's about 765 million blog posts about the diversity “memo” that leaked out of Google a couple of weeks ago. I think the case for any biological difference is pretty weak, and it bothers me when people refer to an “interest gap” as anything else than caused by the...
Home on Erik...
Momentum and mean reversion might just be volatility bias The Economist just published an article called The best, the worst and the ugly. By looking at...
over a year ago
8
over a year ago
The Economist just published an article called The best, the worst and the ugly. By looking at historical performance for mutual funds, they find strong support for momentum and mean reversion. Picking the best or the worst fund over the previous five years gives great returns...
Home on Erik...
Luigi conquering the world I keep forgetting to buy a costume for Halloween every year, so this year I prepared and got myself...
over a year ago
8
over a year ago
I keep forgetting to buy a costume for Halloween every year, so this year I prepared and got myself a Luigi costume a month in advance. Only to realize I was going to be out of town the whole weekend.
Home on Erik...
Detecting corporate fraud using Benford's law Note: This is a silly application. Don't take anything seriously. Benford's law describes a...
over a year ago
8
over a year ago
Note: This is a silly application. Don't take anything seriously. Benford's law describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily...
Home on Erik...
Subway waiting math Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found...
over a year ago
8
over a year ago
Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found that at some point, quite early, it's worth just giving up. This isn't a proof that the subway doesn't run on time – in fact it might actually proves that the subway runs really...
Home on Erik...
Software infrastructure 2.0: a wishlist Software infrastructure (by which I include everything ending with *aaS, or anything remotely...
over a year ago
8
over a year ago
Software infrastructure (by which I include everything ending with *aaS, or anything remotely similar to it) is an exciting field, in particular because (despite what the neo-luddites may say) it keeps getting better every year! I love working with something that moves so...
Home on Erik...
Annoy – now without Boost dependencies and with Python 3 Support Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high...
over a year ago
8
over a year ago
Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces. Spotify uses it a lot to find similar items. First, matrix factorization gives a low dimensional representation of each item (artist/album/track/user) so that every item...
Home on Erik...
Optimizing things: everything is a proxy for a proxy for a proxy Say you build a machine learning model, like a movie recommender system. You need to optimize for...
over a year ago
8
over a year ago
Say you build a machine learning model, like a movie recommender system. You need to optimize for something. You have 1-5 stars as ratings so let's optimize for mean squared error. Great. Then let's say you build a new model.
Home on Erik...
Where do locals go in NYC? One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious...
over a year ago
8
over a year ago
One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious about the larger patterns around this, so I spent some time looking at data. The thing I wanted to understand is: what areas are dominated by tourists?
Home on Erik...
ML at Twitter I recently came across this paper describing how they do ML at Twitter. TL;DR Their approach is...
over a year ago
8
over a year ago
I recently came across this paper describing how they do ML at Twitter. TL;DR Their approach is pretty interesting. Everything is a Pig workflow and then they do everything as UDF's. This approach seems pretty interesting.
Home on Erik...
Mortality statistics and Sweden's "dry tinder" effect We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”....
over a year ago
8
over a year ago
We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”. But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data.
Home on Erik...
We're hiring at Better Just a quick note that my team is always hiring at Better. A lot of new people have been joining the...
over a year ago
7
over a year ago
Just a quick note that my team is always hiring at Better. A lot of new people have been joining the team here in NYC lately—the tech team has actually grown from 35 to 60 in just ~3 months.
Home on Erik...
coin2dice Here's a problem that I used to give to candidates. I stopped using it seriously a long time ago...
over a year ago
7
over a year ago
Here's a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don't believe in puzzles, but I think it's kind of fun. Let's say you have a function that simulates a random coin flip.
Home on Erik...
Recurrent Neural Networks for Collaborative Filtering I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering....
over a year ago
7
over a year ago
I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering. RNN’s are models that predict a sequence of something. The beauty is that this something can be anything really – as long as you can design an output gate with a proper loss function,...
Home on Erik...
Optimizing over multinomial distributions Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 +...
over a year ago
7
over a year ago
Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1 $$ . Usually, $$ f $$ is concave and differentiable, so there's one unique global maximum and you can solve it by applying gradient ascent.
Home on Erik...
What's Erik up to? I joined Better in early 2015 because I thought the team was crazy enough to actually change one of...
over a year ago
7
over a year ago
I joined Better in early 2015 because I thought the team was crazy enough to actually change one of the largest industries in the US. For six years, I ran the tech team, hiring 300+ people, probably doing 2,000+ interviews, and according to GitHub I added 646,941 lines of code...
Home on Erik...
Missing the point about microservices: it's about testing and deploying independently Ok, so I have to first preface this whole blog post by a few things: I really struggle with the...
over a year ago
7
over a year ago
Ok, so I have to first preface this whole blog post by a few things: I really struggle with the term microservices. I can't put my finger on exactly why. Maybe because the term is hopelessly ill-defined, maybe because it's gotten picked up by the hype train.
Home on Erik...
A neat little trick with time decay Something that pops up pretty frequently is to implement time decay, especially where you have...
over a year ago
7
over a year ago
Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today's output by reading yesterday's output, discounting it by $$ exp(-lambda...
Home on Erik...
Toxic meeting culture I spent six years at a company that went from 50 people to 1500 and one contributing factor leading...
over a year ago
7
over a year ago
I spent six years at a company that went from 50 people to 1500 and one contributing factor leading to my departure was that I went from a “maker” to a person stuck in meetings every day.
Home on Erik...
Spotify's Discovery page The Discovery page, the new start page in Spotify, is finally out to a fairly significant percentage...
over a year ago
7
over a year ago
The Discovery page, the new start page in Spotify, is finally out to a fairly significant percentage of all users. Really happy since we have worked on it for the past six months. Here's a screen shot:
Home on Erik...
Calculating cosine similarities using dimensionality reduction This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity...
over a year ago
7
over a year ago
This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO) I just glanced at the paper, and there's some cool stuff going on from a theoretical perspective. What I'm curious about is why they didn't decide to use...
Home on Erik...
The hacker's guide to uncertainty estimates It started with a tweet: New years resolution: every plot I make during 2018 will contain...
over a year ago
7
over a year ago
It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@bernhardsson) January 7, 2018 Why? Because I've been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of...
Home on Erik...
Books I consumed in 2017 Turns out having a toddler isn't super compatible with reading. I used to read ~100 books/year as a...
over a year ago
7
over a year ago
Turns out having a toddler isn't super compatible with reading. I used to read ~100 books/year as a teenager, but it has slowly deteriorated to maybe 20-30 books, at most. And I don't even finish all of them because life is too short!
Home on Erik...
Machine, Platform, Crowd I just bought Machine, Platform, Crowd: Harnessing Our Digital Future and discovered that it...
over a year ago
7
over a year ago
I just bought Machine, Platform, Crowd: Harnessing Our Digital Future and discovered that it mentions my blog – in particular the post When machine learning matters. Ok, I lied a little bit. I didn't discover it serendipitously.
Home on Erik...
NYC Machine Learning meetup From the NYC Machine Learning talk I had last week: Haven't looked at it yet except briefly....
over a year ago
7
over a year ago
From the NYC Machine Learning talk I had last week: Haven't looked at it yet except briefly. Unfortunately the quality isn't the best.
Home on Erik...
Business secrets from terrible people I get bored reading management books very easily and lately I've been reading about a wide range of...
over a year ago
7
over a year ago
I get bored reading management books very easily and lately I've been reading about a wide range of almost arbitrary topics. One of the lenses I tend to read through is to see different management styles in different environments.
Home on Erik...
Annoy 1.10 released, with Hamming distance and Windows support I've been a bit bad at posting things with a regular cadence lately, partly because I'm trying to...
over a year ago
7
over a year ago
I've been a bit bad at posting things with a regular cadence lately, partly because I'm trying to adjust to having a toddler, partly because the hunt for clicks has caused such a high bar for me that I feel like I have to post something Pulitzer-worthy.
Home on Erik...
Conversion rates – you are (most likely) computing them wrong How hard can it be to compute conversion rate? Take the total number of users that converted and...
over a year ago
7
over a year ago
How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Done. Except… it's a lot more complicated when you have any sort of significant time lag.
Home on Erik...
Software Engineers and Automation Every once in a while when talking to smart people the topic of automation comes up. Technology has...
over a year ago
7
over a year ago
Every once in a while when talking to smart people the topic of automation comes up. Technology has made lots of occupations redundant, so what's next? Switchboard operator, a long time ago What about software engineers?
Home on Erik...
Luigi Presentation @ NYC Data Science, Dec 16, 2014 More Luigi presentations!
over a year ago
Home on Erik...
Microsoft's new marketing strategy: give up I think it's funny how MS at some point realized they are not the cool kids and there's no reason to...
over a year ago
7
over a year ago
I think it's funny how MS at some point realized they are not the cool kids and there's no reason to appeal to that target audience. Their new marketing strategy finally admits what's been long known: the correlation between “business casual” and using Microsoft products:
Home on Erik...
Never attribute to stupidity that which is adequately explained by opportunity cost Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that...
over a year ago
7
over a year ago
Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity. I've found that neither malice nor stupidity is the most common reason when you don't understand why something is in a certain way.
Home on Erik...
What is your motivation? I've been trying to learn Clojure. I keep telling people I meet that I really want to learn Clojure,...
over a year ago
7
over a year ago
I've been trying to learn Clojure. I keep telling people I meet that I really want to learn Clojure, but still every night I can't get myself to spend time with it. It's unclear if I really want to learn Clojure or just want to have learned Clojure?
Home on Erik...
More MCMC – Analyzing a small dataset with 1-5 ratings I've been obsessed with how to iterate quickly based on small scale feedback lately. One awesome...
over a year ago
7
over a year ago
I've been obsessed with how to iterate quickly based on small scale feedback lately. One awesome website I encountered is Usability Hub which lets you run 5 second tests. Users see your site for 5 seconds and you can ask them free-form questions afterwards.
Home on Erik...
The hardest challenge about becoming a manager Note: this post is full of pseudo-psychology and highly speculative content. Like most fun stuff! I...
over a year ago
7
over a year ago
Note: this post is full of pseudo-psychology and highly speculative content. Like most fun stuff! I became a manager back in 2009. Being a developer is fun. You have this very tangible way to measure yourself.
Home on Erik...
Interviewing is a noisy prediction problem I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence...
over a year ago
7
over a year ago
I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I'll tell you if they are good or not!
Home on Erik...
The mathematical principles of management I've read about 100 management books by now but if there's something that always bothered me it's...
over a year ago
7
over a year ago
I've read about 100 management books by now but if there's something that always bothered me it's the lack of first principles thinking. Basically it's a ton of heuristics. And heuristics are great, but when you present heuristics as true objectives, it kind of clouds the...
Home on Erik...
Looking for smart people I haven't mentioned what I'm currently up to. Earlier this year I left Spotify to join a small...
over a year ago
7
over a year ago
I haven't mentioned what I'm currently up to. Earlier this year I left Spotify to join a small startup called Better. We're going after one of the biggest industries in the world that also turns out to be completely broken.
Home on Erik...
Leaving Spotify Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an...
over a year ago
7
over a year ago
Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an amazing experience. I joined Spotify in Stockholm in 2008, mainly because a bunch of friends from programming competitions had joined already.
Home on Erik...
Deep learning for&#8230; chess (addendum) My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple...
over a year ago
7
over a year ago
My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the Github repo got 150 stars overnight.
Home on Erik...
ML+Hadoop at NYC Predictive Analytics I was just at the NYC Predictive Analytics meetup talking about how we build machine learning...
over a year ago
7
over a year ago
I was just at the NYC Predictive Analytics meetup talking about how we build machine learning algorithms using Hadoop to power music recommendations. Great meetup, where we had two speakers, me and Blake Shaw from Foursquare.
Home on Erik...
I'm featured in Mashable This article from today in Mashable describes some of the fun stuff I get to work with: Erik...
over a year ago
7
over a year ago
This article from today in Mashable describes some of the fun stuff I get to work with: Erik Bernhardsson is technical lead at Spotify, where he helped to build a music recommendation system based on large-scale machine learning algorithms, mainly matrix factorization of big...
Home on Erik...
Buffet lines are terrible, but let's try to improve them using computer simulations My company has a buffet every Friday, and the lines grow to epic proportions when the food arrives....
over a year ago
7
over a year ago
My company has a buffet every Friday, and the lines grow to epic proportions when the food arrives. I've suspected for years that the “classic” buffet line system is a deeply flawed and inefficient method, and every time I'm stuck in the line has made me more convinced.
Home on Erik...
Domains for sale Contact me at mail at erik bern dot com!
over a year ago
Home on Erik...
Giving more tools to software engineers: the reorganization of the factory It's a popular attitude among developers to rant about our tools and how broken things are. Maybe...
over a year ago
7
over a year ago
It's a popular attitude among developers to rant about our tools and how broken things are. Maybe I'm an optimistic person, because my viewpoint is the complete opposite! I had my first job as a software engineer in 1999, and in the last two decades I've seen software engineering...
Home on Erik...
Bagging as a regularizer One thing I encountered today was a trick using bagging as a way to go beyond a point estimate and...
over a year ago
6
over a year ago
One thing I encountered today was a trick using bagging as a way to go beyond a point estimate and get an approximation for the full distribution. This can then be used to penalize predictions with larger uncertainty, which helps reducing false positives.
Home on Erik...
Analyzing 50k fonts using deep neural networks For some reason I decided one night I wanted to get a bunch of fonts. A lot of them. An hour later I...
over a year ago
6
over a year ago
For some reason I decided one night I wanted to get a bunch of fonts. A lot of them. An hour later I had a bunch of scrapy scripts pulling down fonts and a few days later I had more than 50k fonts on my computer.
Home on Erik...
The lane next to you is more likely to be slower than yours Saw this link on Hacker News the other day: The Highway Lane Next to Yours Isn’t Really Moving Any...
over a year ago
6
over a year ago
Saw this link on Hacker News the other day: The Highway Lane Next to Yours Isn’t Really Moving Any Faster The article describes a phenomenon unique to traffic where cars spread out when they go fast and get more compact when they go slow.
Home on Erik...
Lessons from content marketing myself (aka blogging) for five years I started writing this blog in late 2012, partly because I felt like it would help me improve my...
over a year ago
6
over a year ago
I started writing this blog in late 2012, partly because I felt like it would help me improve my English and my writing skills, partly because I kept having a lot of random ideas in my head and I wanted to write them down somewhere.
Home on Erik...
State drift I generally haven't written much about software architecture. People make heuristics into religion....
over a year ago
6
over a year ago
I generally haven't written much about software architecture. People make heuristics into religion. But here is something I thought about: how to build in self-correction into systems. This has been something just vaguely sitting in my head lacking a clear conceptual definition...
Home on Erik...
Momentum strategies Haven't posted anything in ages, so here's a quick hack I threw together in Python on a Sunday...
over a year ago
6
over a year ago
Haven't posted anything in ages, so here's a quick hack I threw together in Python on a Sunday night. Basically I wanted to know whether momentum strategies work well for international stock indexes. I spent a bit of time putting together a strategy that buys the stock index if...
Home on Erik...
On the Equifax breach and how to really prevent identity theft A funny thing about being a foreigner is how you realize people take broken things for granted. I'm...
over a year ago
6
over a year ago
A funny thing about being a foreigner is how you realize people take broken things for granted. I'm going to go out on a limb here claiming that the US has a pretty dumb banking system.
Home on Erik...
Building a data team at a mid-stage startup: a short story I guess I should really call this a parable. The backdrop is: you have been brought in to grow a...
over a year ago
6
over a year ago
I guess I should really call this a parable. The backdrop is: you have been brought in to grow a tiny data team (~4 people) at a mid-stage startup (~$10M annual revenue), although this story could take place at many different types of companies.
Home on Erik...
HubSpot's Picture Shows how to Maintain Monocultures in the 21st Century I thought this article about the company culture at HubSpot is kind of funny. “HubSpot's Awesome...
over a year ago
6
over a year ago
I thought this article about the company culture at HubSpot is kind of funny. “HubSpot's Awesome Presentation Shows how to Create a 21st Century Culture”. Just FYI: You're not different. You're a bunch of white hipsters aged 25-30 dressed up in the same theme.
Home on Erik...
Delivering Music Recommendations I've turned into a lazy bastard and I'm just posting presentations on this blog, but here's one from...
over a year ago
6
over a year ago
I've turned into a lazy bastard and I'm just posting presentations on this blog, but here's one from Rohan Singh at Spotify talking about the backend infrastructure of the Discover page.
Home on Erik...
Headcount goals, feature factories, and when to hire those mythical 10x people When I started building up a tech team for Better, I made a very conscious decision to pay at the...
over a year ago
6
over a year ago
When I started building up a tech team for Better, I made a very conscious decision to pay at the high end to get people. I thought this made more sense: they cost a bit more money to hire, but output usually more than compensates for it.
Home on Erik...
Iterate or die Here's a conclusion I've made building consumer products for many years: the speed at which a...
over a year ago
6
over a year ago
Here's a conclusion I've made building consumer products for many years: the speed at which a company innovates is limited by its iteration speed. I don't even mean throughput here. I just mean the cycle time.
Home on Erik...
How to set compensation using commonsense principles Compensation has always been one of the most confusing parts of management to me. Getting it right...
over a year ago
6
over a year ago
Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously extremely important. Compensation is what drives our entire economy, and you could look at the market for labor as one gigantic resource-allocating machine in the same...
Home on Erik...
Top posts These are some blog posts which have gotten a disproportionate amount of traffic (10,000+ page...
over a year ago
6
over a year ago
These are some blog posts which have gotten a disproportionate amount of traffic (10,000+ page views): 2024 It's hard to write code for computers, but it's even harder to write code for humans 2023 Simple sabotage for software 2022 We are still early with the cloud: why...
Home on Erik...
Wikiphilia I've been obsessed with Wikipedia for the past ten years. Occasionally I find some good articles...
over a year ago
6
over a year ago
I've been obsessed with Wikipedia for the past ten years. Occasionally I find some good articles worth sharing and that's why I created the wikiphilia Twitter handle. Just a long stream of stuff that for one reason or another may be interesting.
Home on Erik...
Home
over a year ago
Home on Erik...
Fun with trigonometry: the world's most twisted coastline I just spent a few days in Italy, on the Ligurian coast. Even though we were on the west side of...
over a year ago
6
over a year ago
I just spent a few days in Italy, on the Ligurian coast. Even though we were on the west side of Italy, the Mediterranean sea was to the east, because the house was situated on a long bay.
Home on Erik...
Why organizations fail One of my favorite business hobbies is to reduce some nasty decision down to its absolute core...
over a year ago
6
over a year ago
One of my favorite business hobbies is to reduce some nasty decision down to its absolute core objective, decide the most basic strategy, and then add more and more modifications as you have to confront the complexity of reality (yes I have very lame hobbies thanks I know).
Home on Erik...
Music recommendations using cover images (part 1) Scrolling through the Discover page on Spotify the other day it occurred to me that the album is in...
over a year ago
5
over a year ago
Scrolling through the Discover page on Spotify the other day it occurred to me that the album is in fact a fairly strong visual proxy for what kind of content you can expect from it. I started wondering if the album cover can in fact be used for recommendations.
Home on Erik...
Why I went into the mortgage industry I just realized last Thursday that I have spent two full years at Better, incidentally on the same...
over a year ago
5
over a year ago
I just realized last Thursday that I have spent two full years at Better, incidentally on the same day as we announced a $15M round led by Kleiner Perkins. So it was a good point to reflect a bit and think back – what the F led me to abandon my role managing the machine learning...
Home on Erik...
More Luigi! Continuing in the same spirit of shameless self-promotion, here's some recent Luigi press: Reddit...
over a year ago
5
over a year ago
Continuing in the same spirit of shameless self-promotion, here's some recent Luigi press: Reddit thread A Guide to Python Frameworks for Hadoop (slides from the NYC Hadoop User Group) This presentation from the Open Analytics NYC meetup about how Foursquare uses Luigi  Luigi...
Home on Erik...
Storm in the stratosphere: how the cloud will be reshuffled Here's a theory I have about cloud vendors (AWS, Azure, GCP): Cloud vendors1 will increasingly...
over a year ago
5
over a year ago
Here's a theory I have about cloud vendors (AWS, Azure, GCP): Cloud vendors1 will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API. Other pure-software providers will build all the stuff on top of it.
Home on Erik...
Books I read in 2015 Early last year when I left Spotify I decided to do more reading. I was planning to read at least...
over a year ago
5
over a year ago
Early last year when I left Spotify I decided to do more reading. I was planning to read at least one book per week and in particular I wanted to brush up on management, economics, and technology.