You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue pertains to Chapter 10 and its source code in variability.py, which estimates distributions for the mean and standard deviation of male and female heights, then uses the distributions to compute distributions for the coefficient of variation for males and females. A key result seems to be that the coefficient of variation for females is higher than that of males. However, if you remove the jittering that gets applied to the original heights, this result seems to be reversed.
I also modified line 266 to print the label for the posterior mean being printed.
If you run the script with jittering, you see that the coefficient of variation for females is greater than that of males, which matches the book's result.
$ python variability.py
...
female CV posterior mean 0.04379422911488041
male CV posterior mean 0.04151490569938492
...
female bigger 1.0000000000000628
male bigger 0
The resulting plot also matches that the book:
Now if you comment-out line 462 (the jittering), and re-run the script, you see that the mean coefficient of variation is non-negligibly higher for males.
$ python variability.py
...
male CV posterior mean 0.042135070189436574
female CV posterior mean 0.039877437544664336
...
female bigger 0
male bigger 1.0000000000000615
The resulting plot reflects this result.
My instinct is to trust the second result, as it uses the data in its raw form. Still, it would be nice to understand how this simple jittering can cause such a drastic difference in the coefficient of variation.
I'll post back if I can think of any solution or explanation to this problem.
The text was updated successfully, but these errors were encountered:
Interesting. I will investigate as soon as I can, but it might be a little while.
Both distributions have some strange outliers, which have a disproportionate effect on the estimated CV. I might investigate whether something is going on there.
This issue pertains to Chapter 10 and its source code in
variability.py
, which estimates distributions for the mean and standard deviation of male and female heights, then uses the distributions to compute distributions for the coefficient of variation for males and females. A key result seems to be that the coefficient of variation for females is higher than that of males. However, if you remove the jittering that gets applied to the original heights, this result seems to be reversed.variability.py line 462 applies "jittering" to the list of heights.
I also modified line 266 to print the label for the posterior mean being printed.
If you run the script with jittering, you see that the coefficient of variation for females is greater than that of males, which matches the book's result.
The resulting plot also matches that the book:
Now if you comment-out line 462 (the jittering), and re-run the script, you see that the mean coefficient of variation is non-negligibly higher for males.
The resulting plot reflects this result.
My instinct is to trust the second result, as it uses the data in its raw form. Still, it would be nice to understand how this simple jittering can cause such a drastic difference in the coefficient of variation.
I'll post back if I can think of any solution or explanation to this problem.
The text was updated successfully, but these errors were encountered: