Statistics, polling, and getting your geek on...

Devildoc

Verified Military
Joined
Nov 3, 2015
Messages
7,886
Location
Durham, NC
Good article on predictive nature of primary polls from 538 (I know @Salt USMC has quoted them a time or two, it's a great site). You can use the same statistical methods applied to so much more than politics. In this case, there is strong statistical correlation to polls with primaries and how a party's given nominee pans out. I might start applying this to the current crop of politicos running for 2020 and see how it extrapolates.

We Analyzed 40 Years Of Primary Polls. Even Early On, They’re Fairly Predictive.
 
I don't recall that he had great polling in the primaries though. Do you?
Primaries are notoriously difficult to poll, according to what Nate Silver says. Mostly because polls are done infrequently, and it’s difficult to identify likely voters to build your model.
 
Statistics are great when they address facts:

There are 24 blue sedans in the parking lot.
There are 17 blue coupes in the parking lot.
There are 31 blue pickup trucks in the parking lot.
There are 11 blue SUV's in the parking lot.
Graphed out over 125 total vehicles in the parking lot, it is statistically safe to say most people using that parking-lot like blue automobiles.

Put statistics on top of politics and it becomes a load of crap. It's why the Shy Tory Factor came into existence. It's how Bill Clinton's wife carried more than a 90% chance of victory into the voting booth and before she threw back her last double-shot of loser vodka that night, she had lost by 74 electoral votes.

How long have republicans beat their wives?
Do you think Donald Trump is a bad president, a horrible president, or an alien from planet Nibiru?
Do you think Bill Clinton's wife is good for the country, great for the country, or the goodest most greatest candidate ever?
Do you identify as a democrat or a racist?
Would you consider yourself to be a liberal or a sexist?
Do you think guns should be banned, outlawed, or confiscated, and thrown into a furnace along with their owners?
...lets tally up those answers - annnnnd yep, the statistics prove that Americans hate Donald Trump.
 
There are 24 blue sedans in the parking lot.
There are 17 blue coupes in the parking lot.
There are 31 blue pickup trucks in the parking lot.
There are 11 blue SUV's in the parking lot.
Graphed out over 125 total vehicles in the parking lot, it is statistically safe to say most people using that parking-lot like blue automobiles.

That depends on whether you are on the right or left. If you are on the left, it statistically safe to say that the owner of the parking lot discriminates against people that don't have blue cars...
 
@Box , for statistics to be scientifically valid, they have to look at facts. While you certainly have polls re: "which flavor ice cream do you like," the results are essentially meaningless, unless you extrapolate and control for variables. Even then, all any poll can do, or any statistic value can give you, is probability. Statistics cannot "prove" or "disprove" anything. That's one thing I hate about exit polling: I always lie. You can't control for that.

See, this is why I talked about getting your geek on...
 
Statistics are great when they address facts:

There are 24 blue sedans in the parking lot.
There are 17 blue coupes in the parking lot.
There are 31 blue pickup trucks in the parking lot.
There are 11 blue SUV's in the parking lot.
Graphed out over 125 total vehicles in the parking lot, it is statistically safe to say most people using that parking-lot like blue automobiles.

Put statistics on top of politics and it becomes a load of crap. It's why the Shy Tory Factor came into existence. It's how Bill Clinton's wife carried more than a 90% chance of victory into the voting booth and before she threw back her last double-shot of loser vodka that night, she had lost by 74 electoral votes.

How long have republicans beat their wives?
Do you think Donald Trump is a bad president, a horrible president, or an alien from planet Nibiru?
Do you think Bill Clinton's wife is good for the country, great for the country, or the goodest most greatest candidate ever?
Do you identify as a democrat or a racist?
Would you consider yourself to be a liberal or a sexist?
Do you think guns should be banned, outlawed, or confiscated, and thrown into a furnace along with their owners?
...lets tally up those answers - annnnnd yep, the statistics prove that Americans hate Donald Trump.

“To the owner of the blue vehicle, it’s being towed.”
 
Talk about getting your geek on:

Data: One of those above polls mentioned likely voters. What in the hell makes a likely voter? Do we run a regression of voter participation rates per party and delineate between all demographics across multiple categories in order to determine a usable confidence interval for each one simply for the amount of statistically appropriate amount of people we should use. Should we use simple or stratified random sampling to create the most unbiased data for our given needs? Or do we just include a question in the poll "Do you plan on voting in the next election?" and throw out the no's? What if there are many flavors of Democrats and Republicans? What if most of the nonresponders are of a singular party and we have an omission bias in the data? How big is that error? Would things change if our sample had an even number of rural and urban participants because god knows rural people never get polled?

Model: Are our questions unbiased enough? Are we asking the right questions? Do we use a likert scale and if so what is the scale? What is our margin of error in how we expect these questions to go? Then, they should be tested for validity, but they never are in practice because that costs a lot of money so we go with best guess. Then, how do we structure the model and account for statistical significance in our predictions? Are the residuals homogenous or heterogenous? What if our model isn't exactly significant but we mainly write about the descriptive statistics and make our predictions off of that? Have we tested and controlled for endogeneity?

The point is the people who make these polls follow 1 of 2 paths: 1) Either they don't ask themselves a lot of these questions because they are trying to make money, not write a dissertation and they make their best educated guesses or 2) They actually have all of the reasons why this poll should have a larger margin of error than what the data states, but this never ever makes it in an article because well, that doesn't generate clicks. The people that work for these places know all of this and far more. I don't know how rigorously they examine every statistical aspect, but you can be damn sure that an unpaid intern did as much work of the data entry/manipulation as possible, and you'll never hear about the weaknesses of a given poll in an article, and they rarely publish their data for replication (some places do, but they are rarely the ones that are cited in mainstream media).
 
Back
Top