Voter turnout in the US is pretty low compared to other democracies. There are many reasons for this but among them is the fact that voting isn't as convenient as it could be in many parts of the country, in addition to overt efforts to suppress the vote by placing overly burdensome requirements on voting. Voting isn't always at the forefront of many people's minds and it can be easy to miss a registration deadline or to not realize that you don't have the proper ID to vote before it's too late. There is also some evidence that consolidation of polling places can lead to reduced turnout by making polling places more difficult to find and get to.
Election days are during the workweek. This makes it more difficult for anyone with a job to get to the polls, particularly if they use public transportation or are unable to find time during to workday to get away from work. These seemingly small deterrents to voting can add up to thousands of people not casting a ballot when they otherwise would have if it was more convenient. Over the next few months, we'll be doing several posts on trends in voter turnout in recent elections. It seems unlikely that we'll uncover any trend that is not already well known to political scientists or campaign strategists, but that isn't the point. Instead, the goal is to gradually build up data sets and analysis tools that can help citizens communicate with their local legislators and election administrators to improve voting access and voting rights. If this project ends up making it easier for advocates to address access issues in chronically underserved communities, promote voting by mail, or lobby their state legislators to pass same day or automatic voter registration, then we'll call that a win.
Let's jump right into some data. We'll start things off by looking at presidential turnout by municipality in our home state of New Jersey. This will hardly even scratch the surface of possibilities for analyzing turnout, but it's a start. The municipality level turnout rates can be found on the NJ Department of State website in a series of pdfs organized by county. A script for downloading all these pdfs and extracting the turnout data can be found on GitHub. These results can then be merged with data from the census, in particular we'll use municipality median household income from the American Community Survey. The relationship between median household income and municipality turnout is pretty strong (note the log scale).
It's important to note that income itself may not be directly responsible for differences in turnout, but instead other factors such as education which correlate with income and turnout may be more directly responsible. Another possibility is that the income trend is explained better by factors like having an easier time getting to the polls due to owning a car being able to take time out of the work day. It should also be mentioned that New Jersey is not a swing state, and a feeling that one's vote does not count due to the near certainty that the state would be won by Democrats could have an effect on turnout. In any case the trend is quite strong, and is very interesting given the fact that both parties love to claim that they are champions of working and middle classes.
A lot of the responsibility for election administration falls on county offices, so if some parts of the state are doing a better or worse job at administering elections, then we should see some patterns in turnout by county. We want to know if, after accounting for statewide trend of income on municipality turnout, there are systematic differences in turnout in municipalities in different counties. We can do this by fitting a multilevel regression model to the municipal turnout data. A regression model represents an estimate of a relationship between two variables, and a multilevel regression model estimates relationships in data with a hierarchical structure. In this case we have hierarchical data because there are turnout results on the municipal level, but knowing which county a municipality is in is an additional piece of relevant information which may be related to turnout.
Full details of the regression model can be found at the end of the post, and we'll just give a brief overview here. We use a varying intercept model for municipalities within counties with a constant slope representing the effect of municipality income on turnout. This assumes that the effect of income on turnout does not vary by county, which might not be true, but it is a reasonable starting point. The county regression intercepts represent the average turnout rate in a counties' municipalities after accounting for the effect of income at the municipal level. We can then compare these intercepts to county level data to see if county level differences can be explained by factors such as county income or population.
The figure below shows the fitted regression lines (posterior mean estimate) for each county along with the turnout data. Unlike the first figure in this post, median household income is not on the log scale, which is why the lines are curved. There very rapid decline in turnout once median household income drops below 60-70 thousand or so, and the model does a reasonable job capturing this trend.
This figure is pretty cluttered, so to help illustrate the county level differences here is the same figure, but only showing Morris, Essex, and Hunterdon counties
Some clear differences exist, but what explains them? Hunterdon county has a small population and is relatively affluent. Essex has a higher population and is less affluent than Hunterdon on the county level, although it does contain several extremely affluent suburbs such as Essex Fells in addition to poorer cities like Newark. Note though, that municipalities with comparable incomes do not have the same turnout rate across counties, with the lowest turnout coming from Essex, the highest from Hunterdon, and Morris in between. It seems sensible that it would be more difficult to administer elections in populous counties or counties with fewer resources, but the model was not able to identify such county level trends with enough certainty for us to give them too much credibility.
Below we've plotted the trend in the county intercepts (which again reflect the county turnout rate after accounting for the effect of income on municipal turnout) with county income. The heavy red line represents the mean estimate for the trend. The positive slope means that county turnout is expected to increase with county level income. The thin wispy lines represent the uncertainty in the trend. Since the trend is relatively weak and the uncertainty high, we can't say for sure that the trend is actually positive which much statistical certainty.
We see something similar when we consider the number of registered voters in each county, there is a trend that seems to follow intuitions but is fairly weak and has high uncertainty. Finding which factors explain these differences in county level turnout might require individual level indicators such as age and party affiliation. Regardless of what's going on at the county level, the fact that turnout seems to drop off a cliff one median income falls below 60k is so is pretty alarming. It will take more work to determine how much of this trend can be attributed to voting access. Next time we'll take a look at some more detailed data from the Morris county voter rolls next time, so stay tuned!
Here are some more details of the model itself. We use logistic regression and Bayesian inference with PyMC3. Since municipal level voter turn out seems to top out at around 80%, we mix the logistic model with a guessing parameter which is inferred from the data along with the regression parameters. The guessing parameter also helps deal with outliers. County coefficients are drawn from a T distribution whose mean is determined from a county level regression. All regression predictors (municipal income, county income, and number of registered voters in a county) are mean centered and log transformed. A diagram of the model in the style of the puppy book below, where i indexes municipalities and j indexes counties. The full source code for this post is available on GitHub.