Submitted by Salil Mehta via Statistical Ideas blog, It's an appealing chart from this week's Economic Policy Institute's report, leveraging the fashionable, French economist Piketty's statistics, in order to illustrate how well the "top 1%" are doing in each of the 50 states. The report is provokingly titled: "The Increasingly Unequal States of America". But the report creates distortions in the truth. An important matter affecting hundreds of millions should also include a straight acknowledgement of probability theory. We see through this article, that beyond the obvious national-level inequality (those at the top versus those at the bottom), targeting state-level differences in values is perverse. The latter is more a matter of probability theory, involving large sample sizes. Let's start by looking at this chart below. It shows the differences in state-level ratios, contrasting the typical incomes at the top 1% versus the typical incomes at the bottom 1%: Everyone from the press, to news readers, gawk at how much each state's levels are, in relation to the levels of other arbitrary states. But this is irrational. Are liberal states such as California and New York, twice as biased (or twice as unfair) as conservative states such of Arkansas and Maine? Of course not. But that's the poor logic one would convey from the former two states showing nearly twice the inequality values on the chart, versus the latter two states. Ex-post examination of living costs doesn't fully explain things either, as expenses are generally higher in states such as Vermont, Alaska, and Hawaii, versus the expenses in states such as Texas, Illinois, and most of Appalachia. This week I devoted a couple hours spelling out to a confused Wall Street Journal writer how there is some pertinence here, related to probability theory. Population size theoretically impacts these statistics, and only a small number of these states are grossly unequal enough to warrant exhibiting them through a charming, 50-state map. Whenever we are forced to explore state-level analysis though, it should be done through the prism of simply explaining relative variation, well beyond what random luck would suggest. The most liberal people suggest that even thinking about this math is unnecessary. Perhaps any glorification of wrongs that need to be righted, justifies the means that it would take to get there. Over time this can conflate math ideas with one's ideological bias. We must separate the discussion of national and structural inequality, among a population, from one where there is a perceived advantage for some groups relative to others. We can't prove the inappropriateness of inequality, by looking at the differences in relative inequality between states. We enjoy the right in the U.S. to pursue different outcomes. To take risks on the margins. We've been making many billions of these choices, across generations. This means that we always enjoy some separation in outcomes, particularly among the largest populations. That's how probability theory impacts all of our lives, even in "equal" conditions. We would prefer a safety net against hard times. Yet we will still take risks such as how much and what we learn, what we feed our bodies, what we do on vacation, what financial investments we accept, when we plot a career change... the actions here can't be deemed some unfair inequality. They are the mystical elements we call life! In a divergent context, we will always see these interesting differences (based upon population size alone), in areas that have nothing to do with the draw of inequality. Such as the state-level distribution of newborn baby sizes, or the performance distributions of high-school athletes. We'll mention others still later in this article. And all of this collectively confirms our understanding that there is something important to the probability math, explaining the relative dispersions connected to population sizes. Before moving too far ahead, let's first show that the Economic Policy Institute (EPI) chart above has an obvious concordance between income dispersion and the population size itself. We'll use simple arithmetic(!) as well, substituting for a complex probability area known as copula math (here, and here). If there were no connection between a state's inequality calculations and the population rank, then how many states would be in the top 10 of both? What about in the bottom 10 of both? The answers are quite low: (10/50)*(10/50)*50 = 2 states in the top 10 of both of both variables (10/50)*(10/50)*50 = 2 states in the bottom 10 of both variables So only 4 (2+2) states total. But it's easy to certify from the chart that there is much more of a match among these variables than 4. Of the top 10 populated states, 5 were also among the top 10 "unequal" states: CA, TX, FL, NY, IL. Of the 10 least populated states, 4 were also among the 10 least "unequal" states: VT, AK, ME, HI. So instead of 4 overlapping states, we have a significantly higher 9 (5+4) states overlapping. Additionally, there are no crossover states (e.g., a highly "unequal" less-populated state, nor a less "unequal" highly-populated state). The easy math (9>4 with no crossovers) shows something, and it's not structural inequality. The only common variable between the selection of the top 10 (and in the selection of the bottom 10) populated states is just population size itself! Does population size coerce inequality? Again, no. Otherwise we could just split California into two smaller states, making citizens suddenly feel there is somehow "greater equality". Or we could reunite Virginia and West Virginia, making the new super-state's citizens feel there is magically "greater inequality". But this sort of statistical reasoning is crazy. It leads one to think inequality can be solved with scissors and glue. In our popular "Aristocrats in flyovers" article (a name suggesting easier state-level wealth in less-populated flyover states), we dig into the probability theory of extreme data. And there we continue to see this pattern show up repeatedly in diverse datasets. Such as the wealthiest individual per state, or the number of cumulative Miss America winners per state. Again this couldn't be coincidence, and we can also mathematically solve for the theoretical expected values for parametric most extreme individual, as we did in this article here. Take a look at the bar chart below (of the top 1% income), and see in dots how tight the trend is for relative inequality, versus state population. We mathematically expect the more populated states to have considerably higher top 1% income (a double-digit percent increase!), versus the top 1% income in the less populated states- and this relates to the EPI chart above. Connecticut was the single, unreliable outlier removed, using a parallel statistical process others also do (notice Wyoming is missing in the aforementioned chart.) We also show a related transformed bar chart, below, instead fixating on changes in the the relative standard confidence interval (as one moves across the chart from the less populated states, to the most populated states). We can now confirm that we mustn't ignore probability modeling as part of this story. We can't persistently pretend that less-progressively larger (a generally concave inequality dispersion function, similar to how it is with most economic data) inequality doesn't exist, for the most populated states. Don't assume -as many lay people and activists do- that these are sampling errors that must vanish, as the sample population sprouts into the millions. This would be deceitful and cause most people to further jump on top of similar "research" as the EPI chart, falsely connecting most of the state-level calculation differences to genuine differences in inequality. The conclusions of this article are again as pertinent for the top 1% in a population, as it is for the most extreme person in any group. This is since the top 1% is still extreme enough along the probability distribution (from 0%, to 100%), so that larger populations will lead to less-progressively larger, top percentile thresholds. Of course this is not true for sampling (Ch.5 in Statistics Topics) closer to the middle of a peer distribution (e.g., top 49%, or bottom 49%), where most of us in society have performed through the ages.