Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]

Probably Overthinking It

Probably...
Bertrand’s Boxes An early draft of Probably Overthinking It included two chapters about probability. I still think...
9 months ago
91
9 months ago
An early draft of Probably Overthinking It included two chapters about probability. I still think they are interesting, but the other chapters are really about data, and the examples in these chapters are more like brain teasers — so I’ve saved them for another book. Here’s an...
Probably...
What does a confidence interval mean? Here’s another installment in Data Q&A: Answering the real questions with Python. In general, I will...
10 months ago
86
10 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. In general, I will try to focus on practical problems, but this one is a little more philosophical. confidence What does a confidence interval mean?¶ Here’s a question from the Reddit statistics...
Probably...
Estimation with Small Samples Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
85
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. gauss_bayes Estimation with Small Samples¶ Here’s a question from the Reddit statistics forum. Hey, so imagine I only have 6...
Probably...
What does “strength” mean? Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
10 months ago
85
10 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. corr_trend What does “strength” mean?¶ Here’s a question from the Reddit statistics forum. I am currently doing a uni assignment...
Probably...
Which Standard Deviation? It’s another installment in Data Q&A: Answering the real questions with Python. Previous...
8 months ago
84
8 months ago
It’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. standard_dev Which Standard Deviation¶ Here’s a question from the Reddit statistics forum. When do we use N and when N-1 for...
Probably...
Destructive Testing Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
83
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. sample_size Sample Size Selection¶ Here’s a question from the Reddit statistics forum. Hi Redditors, I am a civil engineer trying...
Probably...
Where’s My Train? Yesterday I presented a webinar for PyMC Labs where I solved one of the exercises from Think Bayes,...
7 months ago
80
7 months ago
Yesterday I presented a webinar for PyMC Labs where I solved one of the exercises from Think Bayes, called “The Red Line Problem”. Here’s the scenario: The Red Line is a subway that connects Cambridge and Boston, Massachusetts. When I was working in Cambridge I took the Red Line...
Probably...
Regrets and Regression It’s another installment in Data Q&A: Answering the real questions with Python. Previous...
7 months ago
78
7 months ago
It’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. standardize Standardization and Normalization¶ Here’s a question from the Reddit statistics forum. I want to write a research...
Probably...
Have the Nones Leveled Off? Last month Ryan Burge published “The Nones Have Hit a Ceiling“, using data from the 2023 Cooperative...
7 months ago
78
7 months ago
Last month Ryan Burge published “The Nones Have Hit a Ceiling“, using data from the 2023 Cooperative Election Study to show that the increase in the number of Americans with no religious affiliation has hit a plateau. Comparing the number of Atheists, Agnostics, and “Nothing in...
Probably...
Logarithms and Heteroskedasticity Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
78
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. log_heterosked Logarithms and heteroskedasticity¶ Here’s a question from the Reddit statistics forum. Is it correct to use...
Probably...
What is a percentile rank? Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
8 months ago
78
8 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. percentile_rank What is a Percentile Rank?¶ Here’s a question from the Reddit statistics forum. What’s the difference between...
Probably...
Testing Percentiles Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
77
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. test_percentile Testing percentiles¶ Here’s a question from the Reddit statistics forum. I have two different samples (about 100...
Probably...
Combining Risks Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
77
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. combine_risk Combining Risks¶ Here’s a question from the Reddit statistics forum. Bit of a weird one but I’m hoping you’re the...
Probably...
Think Python Goes to Production Think Python has moved into production, on schedule for the official publication date in July — but...
11 months ago
73
11 months ago
Think Python has moved into production, on schedule for the official publication date in July — but maybe earlier if things go well. To celebrate, I have posted the next batch of chapters on the new site, up through Chapter 12, which is about Markov text analysis and generation,...
Probably...
Should divorce be more difficult? “The Christian right is coming for divorce next,” according to this recent Vox article, and “Some...
8 months ago
72
8 months ago
“The Christian right is coming for divorce next,” according to this recent Vox article, and “Some conservatives want to make it a lot harder to dissolve a marriage.” As always when I read an article like this, I want to see data — and the General Social Survey has just the data I...
Probably...
Too many bronze medals? In a recent video, Hank Green nerd-sniped me by asking a question I couldn’t not answer. At one...
6 months ago
69
6 months ago
In a recent video, Hank Green nerd-sniped me by asking a question I couldn’t not answer. At one point in the video, he shows “a graph of the last 20 years of Olympic games showing the gold, silver, and bronze medals from continental Europe. And it “shows continental Europe having...
Probably...
Migration and Population Growth On a recent run I was talking with a friend from Spain about immigration in Europe. We speculated...
8 months ago
68
8 months ago
On a recent run I was talking with a friend from Spain about immigration in Europe. We speculated about whether the population of Spain would be growing or shrinking if there were no international migration. I thought it might be shrinking, but we were not sure. Fortunately, Our...
Probably...
The mean of a Likert scale? Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
9 months ago
67
9 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. likert_mean Likert scale analysis¶ Here’s a question from the Reddit statistics forum. I have collected data regarding how...
Probably...
Ears Are Weird In a previous article, I looked at 93 measurements from the ANSUR-II dataset and found that ear...
5 months ago
66
5 months ago
In a previous article, I looked at 93 measurements from the ANSUR-II dataset and found that ear protrusion is not correlated with any other measurement. In a followup article, I used principle component analysis to explore the correlation structure of the measurements, and found...
Probably...
Data Q&A Today I’m starting a new project with the working title Data Q&A: Answering the real questions with...
10 months ago
63
10 months ago
Today I’m starting a new project with the working title Data Q&A: Answering the real questions with Python. In each installment, I’ll take a question from Reddit’s statistics forum and answer it, using Python code to demonstrate. The first installment is a question about the...
Probably...
The Political Gender Gap is Not Growing In a previous article, I used data from the General Social Survey (GSS) to see if there is a growing...
a year ago
60
a year ago
In a previous article, I used data from the General Social Survey (GSS) to see if there is a growing gender gap among young people in political alignment, party affiliation, or political attitudes. So far, the answer is no. Ryan Burge has done a similar analysis with data from...
Probably...
Density and Likelihood It’s another installment in Data Q&A: Answering the real questions with Python. Previous...
7 months ago
60
7 months ago
It’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. If you get this post by email, the formatting might be broken — if so, you might want to read it on the site. likelihood Density and...
Probably...
Standard deviation of a count This post is part of a new project with the working title Data Q&A: Answering the real questions...
10 months ago
60
10 months ago
This post is part of a new project with the working title Data Q&A: Answering the real questions with Python. In each installment, I’ll take a question from Reddit’s statistics forum and answer it, using Python code to demonstrate. My answer is in a Jupyter notebook — see the...
Probably...
Elements of Data Science I’m excited to announce the launch of my newest book, Elements of Data Science. As the subtitle...
7 months ago
58
7 months ago
I’m excited to announce the launch of my newest book, Elements of Data Science. As the subtitle suggests, it is about “Getting started with Data Science and Python”. Order now from Lulu.com and get 20% off! I am publishing this book myself, which has one big advantage: I can...
Probably...
Bootstrapping a Proportion It’s another installment in Data Q&A: Answering the real questions with Python. Previous...
4 months ago
57
4 months ago
It’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. Here’s a question from the Reddit statistics forum. How do I use bootstrapping to generate confidence intervals for a...
Probably...
Political Alignment and Outlook This is the fourth in a series of excerpts from Elements of Data Science, now available from...
2 months ago
56
2 months ago
This is the fourth in a series of excerpts from Elements of Data Science, now available from Lulu.com and online booksellers. It’s from Chapter 15, which is part of the political alignment case study. You can read the complete chapter here, or run the Jupyter notebook on Colab....
Probably...
Think Stats 3rd Edition I am excited to announce that I have started work on a third edition of Think Stats, to be published...
4 months ago
56
4 months ago
I am excited to announce that I have started work on a third edition of Think Stats, to be published by O’Reilly Media in 2025. At this point the content is mostly settled, and I am revising chapters to get them ready for technical review. If you want to start reading now, the...
Probably...
Young Americans are Marrying Later or Never I’ve written before about changes in marriage patterns in the U.S., and it’s one of the examples in...
2 months ago
56
2 months ago
I’ve written before about changes in marriage patterns in the U.S., and it’s one of the examples in Chapter 13 of the new third edition of Think Stats. My analysis uses data from the National Survey of Family Growth (NSFG). Today they released the most recent data, from surveys...
Probably...
Rip-off ETF? An article in a recent issue of The Economist suggests, right in the title, “Investors should avoid...
5 months ago
55
5 months ago
An article in a recent issue of The Economist suggests, right in the title, “Investors should avoid a new generation of rip-off ETFs”. An ETF is an exchange-traded fund, which holds a collection of assets and trades on an exchange like a single stock. For example, the SPDR S&P...
Probably...
PMFs and PDFs It’s another installment in Data Q&A: Answering the real questions with Python. Previous...
7 months ago
54
7 months ago
It’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. If you get this post by email, the formatting is not good — you might want to read it on the site. pmf_and_pdf PMFs and PDFs¶ Here’s...
Probably...
Reject Math Supremacy The premise of Think Stats, and the other books in the Think series, is that programming is a tool...
2 months ago
53
2 months ago
The premise of Think Stats, and the other books in the Think series, is that programming is a tool for teaching and learning — and many ideas that are commonly presented in math notation can be more clearly presented in code. In the draft third edition of Think Stats there is...
Probably...
Probably the Book Last week I had the pleasure of presenting a keynote at posit::conf(2024). When the video is...
6 months ago
50
6 months ago
Last week I had the pleasure of presenting a keynote at posit::conf(2024). When the video is available, I will post it here. In the meantime, you can read the slides, if you don’t mind spoilers. For people at the conference who don’t know me, this might be a good time to...
Probably...
Comparing Distributions This is the second is a series of excerpts from Elements of Data Science which available from...
3 months ago
50
3 months ago
This is the second is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It’s from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting...
Probably...
Confidence In the Press This is the fifth in a series of excerpts from Elements of Data Science, now available from Lulu.com...
a month ago
47
a month ago
This is the fifth in a series of excerpts from Elements of Data Science, now available from Lulu.com and online booksellers. It’s based on Chapter 16, which is part of the political alignment case study. You can read the complete example here, or run the Jupyter notebook on...
Probably...
The Gender Gap in Political Beliefs Is Small In previous articles (here, here, and here) I’ve looked at evidence of a gender gap in political...
a year ago
45
a year ago
In previous articles (here, here, and here) I’ve looked at evidence of a gender gap in political alignment (liberal or conservative), party affiliation (Democrat or Republican), and policy preferences. Using data from the GSS, I found that women are more likely to say they are...
Probably...
Probably Overthinking It Notebooks To celebrate one month since the launch of Probably Overthinking It, I’m releasing the Jupyter...
a year ago
43
a year ago
To celebrate one month since the launch of Probably Overthinking It, I’m releasing the Jupyter notebooks I used to create the book. There’s one per chapter, and they contain all of the code I used to do the analysis and generate the figures. So if you are curious about the...
Probably...
Political Alignment, Affiliation, and Attitudes Is there a growing gender gap in the U.S? Alignment A recent article in the Financial Times suggests...
a year ago
43
a year ago
Is there a growing gender gap in the U.S? Alignment A recent article in the Financial Times suggests that among young people there is a growing gender gap in political alignment on a spectrum from liberal to conservative. In last week’s post, I tried to replicate this result...
Probably...
Hazard and Survival Here’s a question from the Reddit statistics forum. If I have a tumor that I’ve been told has a...
2 months ago
42
2 months ago
Here’s a question from the Reddit statistics forum. If I have a tumor that I’ve been told has a malignancy rate of 2% per year, does that compound? So after 5 years there’s a 10% chance it will turn malignant? This turns out to be an interesting question, because the answer...
Probably...
Download the World in Data Our World in Data recently announced that they are providing APIs to access their data....
2 months ago
42
2 months ago
Our World in Data recently announced that they are providing APIs to access their data. Coincidentally, I am using one of their datasets in my workshop on time series analysis at PyData Global 2024. So I took this opportunity to update my example using the new API – this notebook...
Probably...
Is the Ideology Gap Growing? This tweet from John Burn-Murdoch links to an article in the Financial Times (FT), “A new global...
a year ago
41
a year ago
This tweet from John Burn-Murdoch links to an article in the Financial Times (FT), “A new global gender divide is emerging”, which includes this figure: The article claims: In the US, Gallup data shows that after decades where the sexes were each spread roughly equally across...
Probably...
What’s a Chartist? Recently I heard the word “chartist” for the first time in my life (that I recall). And then later...
3 months ago
40
3 months ago
Recently I heard the word “chartist” for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions: To answer the second question first, it’s someone who supported chartism, which was “a working-class movement for...
Probably...
Multiple Regression with StatsModels This is the third is a series of excerpts from Elements of Data Science which available from...
2 months ago
38
2 months ago
This is the third is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It’s from Chapter 10, which is about multiple regression. You can read the complete chapter here, or run the Jupyter notebook on Colab. In the previous...
Probably...
Smoking Causes Cancer In the preface of Probably Overthinking It, I wrote: Sometimes interpreting data is easy. For...
a year ago
38
a year ago
In the preface of Probably Overthinking It, I wrote: Sometimes interpreting data is easy. For example, one of the reasons we know that smoking causes lung cancer is that when only 20% of the population smoked, 80% of people with lung cancer were smokers. If you are a doctor who...
Probably...
Zipf’s Law Elements of Data Science is in print now, available from Lulu.com and online booksellers. To...
3 months ago
37
3 months ago
Elements of Data Science is in print now, available from Lulu.com and online booksellers. To celebrate, I’ll post some excerpts here, starting with one of my favorite examples, Zipf’s Law. You can read the complete chapter here, or run the Jupyter notebook on Colab. In almost any...
Probably...
Think Python third edition! I am happy to announce the third edition of Think Python, which will be published by O’Reilly Media...
a year ago
33
a year ago
I am happy to announce the third edition of Think Python, which will be published by O’Reilly Media later this year. You can read the online version of the book here. I’ve posted the Preface and the first four chapters — more on the way soon! You can read the Early Release and...
Probably...
Small percentiles and missing data Here’s another installment in Data Q&A: Answering the real questions with Python. Previous...
10 months ago
33
10 months ago
Here’s another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. low_percentile Bootstrapping percentiles¶ Here’s a question from the Reddit statistics forum. I’m trying to figure out how to...
Probably...
How Many Books? If you like this article, you can read more about this kind of Bayesian analysis in Think Bayes....
a year ago
31
a year ago
If you like this article, you can read more about this kind of Bayesian analysis in Think Bayes. Recently I found a copy of Probably Overthinking It at a local bookstore and posted a picture on Twitter. Aubrey Clayton replied with this question: It’s a great question with what...
Probably...
Extremes, outliers, and GOATS The video from my PyData Global 2023 talk, Extremes, outliers, and GOATS, is available now: The...
a year ago
26
a year ago
The video from my PyData Global 2023 talk, Extremes, outliers, and GOATS, is available now: The slides are here. There are two Jupyter notebooks that contain the analysis I presented: Here’s the abstract: The fastest runners are much faster than we expect from a Gaussian...
Probably...
The World Population Singularity One of the exercises in Modeling and Simulation in Python invites readers to download estimates of...
a year ago
22
a year ago
One of the exercises in Modeling and Simulation in Python invites readers to download estimates of world population from 10,000 BCE to the present, and to see if they are well modeled by any simple mathematical function. Here’s what the estimates look like (aggregated on...
Probably...
The Center Moves Faster Than You In May 2022, Elon Musk tweeted this cartoon: The creator of the cartoon, Colin Wright, explained it...
a year ago
22
a year ago
In May 2022, Elon Musk tweeted this cartoon: The creator of the cartoon, Colin Wright, explained it like this: At the outset, I stand happily beside ‘my fellow liberal,’ who is slightly to my left. In 2012 he sprints to the left, dragging out the left end of the political...
Probably...
Algorithmic Fairness This is the last in a series of excerpts from Elements of Data Science, now available from Lulu.com...
a month ago
20
a month ago
This is the last in a series of excerpts from Elements of Data Science, now available from Lulu.com and online booksellers. This article is based on the Recidivism Case Study, which is about algorithmic fairness. The goal of the case study is to explain the statistical arguments...
Probably...
What are the odds? Whenever something unlikely happens, it is tempting to ask, “What are the odds?” In some very...
a year ago
18
a year ago
Whenever something unlikely happens, it is tempting to ask, “What are the odds?” In some very limited cases, we can answer that question. For example, if someone deals you five cards from a well-shuffled deck, and you want to know the odds of getting a royal flush, we can answer...
Probably...
We Have a Book! My copy of Probably Overthinking It has arrived! If you want a copy for yourself, you can get a 30%...
a year ago
18
a year ago
My copy of Probably Overthinking It has arrived! If you want a copy for yourself, you can get a 30% discount if you order from the publisher and use the code UCPNEW. You can also order from Amazon or, if you want to support independent bookstores, from Bookshop.org. The official...
Probably...
The Overton Paradox in Three Graphs Older people are more likely to say they are conservative. And older people believe more...
a year ago
17
a year ago
Older people are more likely to say they are conservative. And older people believe more conservative things. But if you group people by decade of birth, most groups get more liberal as they get older. So if people get more liberal, on average, why are they more likely to say...
Probably...
Happy Launch Day! Today is the official publication date of Probably Overthinking It! You can get a 30% discount if...
a year ago
13
a year ago
Today is the official publication date of Probably Overthinking It! You can get a 30% discount if you order from the publisher and use the code UCPNEW. You can also order from Amazon or, if you want to support independent bookstores, from Bookshop.org. I celebrated launch day by...
Probably...
Life in a Lognormal World At PyData Global 2023 I will present a talk, “Extremes, outliers, and GOATs: On life in a lognormal...
a year ago
13
a year ago
At PyData Global 2023 I will present a talk, “Extremes, outliers, and GOATs: On life in a lognormal world”. It is scheduled for Wednesday 6 December at 11 am Eastern Time. Here is the abstract: The fastest runners are much faster than we expect from a Gaussian distribution, and...
Probably...
Why are you so slow? Recently a shoe store in France ran a promotion called “Rob It to Get It”, which invited customers...
a year ago
12
a year ago
Recently a shoe store in France ran a promotion called “Rob It to Get It”, which invited customers to try to steal something by grabbing it and running out of the store. But there was a catch — the “security guard” was a professional sprinter, Méba Mickael Zeze. As you would...
Probably...
Superbolts Probably Overthinking It is available to predorder now. You can get a 30% discount if you order from...
a year ago
12
a year ago
Probably Overthinking It is available to predorder now. You can get a 30% discount if you order from the publisher and use the code UCPNEW. You can also order from Amazon or, if you want to support independent bookstores, from Bookshop.org. Recently I read a Scientific American...
Probably...
Another step toward a two-hour marathon This is an update to an analysis I run each time the marathon world record is broken. If you like...
a year ago
11
a year ago
This is an update to an analysis I run each time the marathon world record is broken. If you like this sort of thing, you will like my forthcoming book, Probably Overthinking It, which is available for preorder now. On October 8, 2023, Kelvin Kiptum ran the Chicago Marathon in...
Probably...
What size is that correlation? This article is related to Chapter 6 of Probably Overthinking It, which is available for preorder...
a year ago
9
a year ago
This article is related to Chapter 6 of Probably Overthinking It, which is available for preorder now. It is also related to a new course at Brilliant.org, Explaining Variation. Suppose you find a correlation of 0.36. How would you characterize it? I posed this question to the...
Probably...
How Does World Population Grow? Recently I posed this question on Twitter: “Since 1960, has world population grown exponentially,...
a year ago
8
a year ago
Recently I posed this question on Twitter: “Since 1960, has world population grown exponentially, quadratically, linearly, or logarithmically?” Here are the responses: By a narrow margin, the most popular answer is correct — since 1960 world population growth has been roughly...