It's Time to End Gerrymandering V: Partisan Symmetry Continued

We previously discussed the notion of partisan symmetry in our series on gerrymandering, as well as statistical tests based on concepts of skew and spread which can be used to detect gerrymandering when it occurs. Here we'll try to put all these concepts together into a single statistical test, which ultimately turns out to be a small modification of the skew test proposed by Sam Wang. We'll also discuss some of the limitations of the statistical testing approach we have employed, and how those limitations could be remedied through more sophisticated methods.

First, as a reminder, the skew test is based on the difference between the mean and median, which is a measure of partisan symmetry. That difference is then divided by the standard deviation. We can then compare this result to a reference symmetric model (usually a standard normal distribution), and determine whether or not the mean median difference we observe in an election result is unusual. If the probability that an observed asymmetry occurred by chance is small (usually less than 5%) then we can say with a good degree of confidence that gerrymandering has occurred. 

One of the shortcomings of the mean median difference is that it is most reliable when applied to closely divided states. The reason for this is that if a state is more partisan than 50-50, creating an asymmetry often results in a small but nonzero mean median difference which is not distinguishable from a chance occurrence. On the other hand, these states do tend to have two other characteristics to some degree. The first is tightly clustered vote percentages, which motivated the use of the tests based on spread in earlier posts. The second is an unusual number of districts that outperform the gerrymandering party's overall statewide level of support, which is another way to say that the results are lopsided around the mean. This is another way to measure asymmetry that is complementary to the mean median difference and can be tested using the Kolmogorov Smirnov test described in our post on partisan symmetry.

All three of these concepts can encapsulated and visualized using a seats votes curve, which is the result of uniformly shifting vote percentages in a state's districts and recalculating the results. Examples of seats votes curves for all 50 states can be found here. The mean median difference is horizontal asymmetry in the seats votes curve at 50% of the vote, while lopsidedness about the mean is the vertical asymmetry in the seats votes curve at 50% of the vote. Clustering results in some portion of the seats votes curve being highly volatile, meaning that small changes in the votes cause large shifts in the seat allocation. Take Pennsylvania as an example from the link above, it is a closely divided state in terms of partisanship, and is flagged as gerrymandered by the mean median difference test. In addition to showing horizontal asymmetry, it also has a substantial vertical asymmetry and volatility in the seats votes curve, and therefore has all of the fingerprints of gerrymandering that our statistical tests try to detect. Alabama, a more partisan state, does not show much horizontal asymmetry but has significant vertical asymmetry and volatility. 

The beauty of the mean median difference test is its simplicity, but the loss of power for more partisan states is a problem. We can boost the power of this metric for measuring gerrymandering by dividing the mean median difference by the median absolute deviation (MAD) instead of the sample standard deviation (SD). There are a few reasons why the MAD is preferable to the SD for measuring gerrymandering. The first is that the MAD is less sensitive to outliers in the data than the SD, and is therefore more robust. Additionally, the larger the spread in a given set of data, the more difficult it becomes to distinguish small to moderate asymmetries from chance occurrences. However, measuring spread with the SD may overestimate the true spread of the results since the SD is sensitive to outliers. That means that one or two heavily packed districts might throw off the spread estimate for the entire state even if most districts are clustered closely. The MAD is a better estimate of the true spread in this scenario. Lastly, while it does not appear that there is a fundamental theoretical relationship between clustering and vertical asymmetry (lopsidedness about the mean), empirically they seem to go hand in hand. Below, we've taken the 2008, 2012, and 2016 presidential results computed at the district level (using the post 2010 census districts) and plotted the clustering vs the vertical asymmetry for each state and each election. The clustering is measured by SD/1.4826*MAD-1 (for normally distributed data SD = 1.4826*MAD), and the vertical asymmetry is measured by the absolute value of 50% - the number of seats that are greater than the redistricting party's mean level of support. 

There are some exceptions, but generally larger vertical asymmetries are associated with tighter clustering, meaning that the SD will likely overestimate the spread in these states. Again, this is just an empirical relationship but it appears useful. Examining the p-values of the mean median difference test using SD and MAD shows that the MAD generally results in a more strict standard, though the values are similar in quite a few cases.

The map below shows the smallest p-value observed among the 2008, 2012, and 2016 elections using the mean median difference with the MAD is shown below. The test is capable of flagging both closely divided states and more partisan states.

The map shows only the smallest p-value found for each state over the 2008, 2012, and 2016 elections, but for the most part, states with a small p-value in one year have a small p-value every year. In particular, in 22 states the smallest p-value in any election was less than 5%. Of these, 17 did not have a p-value exceeding 20% in any of the three elections election, 12 did not have a p-value exceeding 10% in any of the three elections, and 11 did not have a p-value exceeding 5% in any election. This means that large asymmetries tend to be durable, and therefore observing asymmetry in a single election gives us some amount of predictive capability for future elections. The same conclusion was reached using a different gerrymandering metric by the expert witness for the plaintiffs in Whitford v. Gil based on a much more comprehensive analysis of elections through U.S. history

In many of the real world applications for statistical hypothesis testing is It is actually not very typical for a small p-value in one test to be predictive of small p-values in future tests. For example, if researchers are trying to determine whether or not a particular medical intervention has an effect on group of patients, small p-values are frequently not replicated if an initially successful trial is repeated. There are a number of reasons for this such as small sample sizes leading to increased uncertainty, and the difficulty in designing trials or experiments that can reliably distinguish between randomness and the effects we are interested in. Some more discussion of this can be found in this nature article. The fact that we seem have the opposite scenario when measuring gerrymandering means that an observed asymmetry in an electoral map is likely to real and persistent feature of that map, which justifies a court intervening. 

This also suggests that one could use more information to develop more sophisticated statistical methods for evaluating gerrymandering. With the statistical hypothesis testing approach, we are comparing a given set of election results to a model that assumes that district results are independent and identically distributed. This is often good enough for determining whether or not an election result is consistent with the notion of symmetry, but district results are neither independent nor identically distributed in reality. An alternative approach is to use election results to create a probability model for each district individually, and then to evaluate that model to asses the fairness of a map. This type of model has a level of granularity not available in the hypothesis testing approach that allows seat counts to be predicted as well as various measures of asymmetry and fairness. If you are interested in such a model, look no further than the calculators from autoredistrict.org by Kevin Baas. This type of model takes advantage of the fact that voter preferences are fairly stable, and provides a direct way to evaluate the likelihood of a given map producing unfair outcomes in the future on a case by case basis. We know generally that asymmetries tend to be stable, but the statistical modeling approach allows us to characterize and quantify this tendency directly.

In the end, as long as partisan state legislatures are in control the redistricting process, they will try to use it to give themselves and their allies an advantage. Regardless of what approach for measuring gerrymandering is taken, the best we can do is put the brakes on how much gerrymandering a state legislature can get away with. A judicial standard for gerrymandering is needed desperately, but in addition the power of redistricting should be taken away from state legislatures and turned over to independent commissions. Use this letter and contact your state representatives and demand that your state establish an independent redistricting commission.