About a week ago I wrote about how to estimate congressional district level opinion from the Cooperative Congressional Election Survey, which is a large national opinion poll. I sent the post to Andrew Gelman, a statistician and political scientist, who gave me some great feedback on the methodology I used. In this post, I've revised my opinion model based on that feedback, and have made a few other additional enhancements as well. Compared to the previous model, this iteration produces very similar point estimates, but with reduced variance. Here I'll explain the changes and the reasoning behind them.
In the original post, I used a hierarchical student T distribution to model district level regression intercepts, with the standard deviation and degrees of freedom estimated from the data. I did not use a state or regional level regression as is typically done in the literature. There were two main reasons for this, the first is that I tended to get inefficient inferences when I included the state and regional level variation, which I blamed on correlations between the district level predictors and the state/regional regression indicators. However, at Andrew's suggestion I tried using more informative priors for the variance parameters in the model (which for example represent the standard deviation of the group distribution for the district level intercepts) and this problem resolved instantly.
To be more specific, I was originally using a half-Cauchy prior with a scale parameter of 5, which has a long tail and is more vague than necessary for this application. A half-normal prior with a standard deviation of 1 lead to much better convergence. A look at the plots of these two distributions shows that the half Cauchy is putting too much prior mass on large values for these variance parameters. Logistic regression coefficients tend to be small, and so the variance of the group distribution for a set of these coefficients will be even smaller. The half normal keeps the prior mass in reasonable territory since we are unlikely to see standard deviations that are much greater than one, but is not too restrictive either.
Including state level indicators will add some additional shrinkage to the regression model, where estimates of district level intercepts get pulled toward the state mean. Overall, I'm interested in finding topics which are polarizing by age, geography, education, and other factors. So my second objection to including state indicators was that I wanted to avoid a scenario where too much shrinkage gets applied to the district indicators, since I didn't want this to end up hiding trends inside heterogeneous or politically polarized states.
My original approach to this problem was to remove the state level indicators but using a robust district level regression with a T distribution. However we can also handle this problem by keeping the state level indicators, but allowing for different district variances by state. We can go another step further with this approach by putting a hyperprior on the district variances, so that estimates from state to state mutually inform one another. This gives us the best of both worlds, information about the variance of district intercepts is shared between states, but we can still capture different variances in different states if the data support that.
In The Puppy Book, hierarchical gamma distributions are used in an example where variances differ by group, while Bayesian Data Analysis by Gelman et al has an example with hierarchical half Cauchy distributions. I've used hierarchical half normal distributions, since the short tail seems to help for this problem. The prior for the standard deviation of each state's districts thus follow a half normal distribution. The standard deviations of these distributions follow a higher level half normal distribution, with the standard deviation set to 0.5. I used an analogous set up for four regions (North East, South, Midwest, and West), which are a level above states.
Traces for these variance parameters and hyperparameters are shown below for the same question on white privilege I looked at in the previous post. Respondents were asked to rate the statement "White people in the U.S. have certain advantages because of the color of their skin" on a scale from strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, and strongly disagree. I combined the agrees into one category and the remaining responses into the second. I ran three chains which are combined to reduce clutter on the plots. On average, there is more variation of district intercepts within states than there is variation of stateintercepts within regions.
The distributions of the district standard deviations also give us some interesting insight. A large standard deviation could indicate that after accounting for individual level characteristics, a state's districts are highly polarized. The mean posterior estimates along with 50% and 95% confidence intervals are shown below for each state that has more than one district.
Generally, it looks like states with high standard deviations have a number of both urban and rural districts, which means the urban-rural divide could help explain some of the variation. Although I think it's likely that there's a lot more to the story. Next I'll be trying a more complex set of individual level indicators to see if that ends up reducing some of the district level variation.