# Statistics, polling, and getting your geek on...



## Devildoc (Apr 24, 2019)

Good article on predictive nature of primary polls from 538 (I know @Salt USMC has quoted them a time or two, it's a great site).  You can use the same statistical methods applied to so much more than politics.  In this case, there is strong statistical correlation to polls with primaries and how a party's given nominee pans out.  I might start applying this to the current crop of politicos running for 2020 and see how it extrapolates.

We Analyzed 40 Years Of Primary Polls. Even Early On, They’re Fairly Predictive.


----------



## Brill (Apr 24, 2019)

Where’s their 2016 poll?


----------



## BlackCloud (Apr 25, 2019)

Interesting...


----------



## Devildoc (Apr 25, 2019)

lindy said:


> Where’s their 2016 poll?



In statistics we'd call that "an outlier" lol....


----------



## Salt USMC (Apr 25, 2019)

FiveThirtyEight gave Trump a 30% chance of winning on election night - the highest percentage of any major statistical outfit.  They're legit.
Why FiveThirtyEight Gave Trump A Better Chance Than Almost Anyone Else


----------



## Devildoc (Apr 25, 2019)

Salt USMC said:


> FiveThirtyEight gave Trump a 30% chance of winning on election night - the highest percentage of any major statistical outfit.  They're legit.
> Why FiveThirtyEight Gave Trump A Better Chance Than Almost Anyone Else



I don't recall that he had great polling in the primaries though. Do you?


----------



## Salt USMC (Apr 25, 2019)

Devildoc said:


> I don't recall that he had great polling in the primaries though. Do you?


Primaries are notoriously difficult to poll, according to what Nate Silver says.  Mostly because polls are done infrequently, and it’s difficult to identify likely voters to build your model.


----------



## 757 (Apr 25, 2019)

During the last two elections I used the LA Times polling data and discovered it was one of the most accurate for presidential elections. For instance, on November 8th, LA Times has Trump over Clinton by 3%.


----------



## compforce (Apr 25, 2019)

82% of all statistics are made up...


----------



## LibraryLady (Apr 25, 2019)

compforce said:


> 82% of all statistics are made up...


I thought it was 87%?

LL


----------



## compforce (Apr 25, 2019)

LibraryLady said:


> I thought it was 87%?
> 
> LL



That might be the number that I made up last time


----------



## LibraryLady (Apr 25, 2019)

compforce said:


> That might be the number that I made up last time


How dare you attempt to appropriate my reality?!?  

LL


----------



## Box (Apr 25, 2019)

Statistics are great when they address facts:

There are 24 blue sedans in the parking lot.
There are 17 blue coupes in the parking lot.
There are 31 blue pickup trucks in the parking lot.
There are 11 blue SUV's in the parking lot.
Graphed out over 125 total vehicles in the parking lot, it is statistically safe to say most people using that parking-lot like blue automobiles.

Put statistics on top of politics and it becomes a load of crap.  It's why the Shy Tory Factor came into existence.  It's how Bill Clinton's wife carried more than a 90% chance of victory into the voting booth and before she threw back her last double-shot of loser vodka that night, she had lost by 74 electoral votes.

How long have republicans beat their wives?
Do you think Donald Trump is a bad president, a horrible president, or an alien from planet Nibiru?
Do you think Bill Clinton's wife is good for the country, great for the country, or the goodest most greatest candidate ever?
Do you identify as a democrat or a racist?
Would you consider yourself to be a liberal or a sexist?
Do you think guns should be banned, outlawed, or confiscated, and thrown into a furnace along with their owners?
...lets tally up those answers - annnnnd yep, the statistics prove that Americans hate Donald Trump.


----------



## compforce (Apr 25, 2019)

Box said:


> There are 24 blue sedans in the parking lot.
> There are 17 blue coupes in the parking lot.
> There are 31 blue pickup trucks in the parking lot.
> There are 11 blue SUV's in the parking lot.
> Graphed out over 125 total vehicles in the parking lot, it is statistically safe to say most people using that parking-lot like blue automobiles.



That depends on whether you are on the right or left.  If you are on the left, it statistically safe to say that the owner of the parking lot discriminates against people that don't have blue cars...


----------



## Devildoc (Apr 25, 2019)

@Box , for statistics to be scientifically valid, they _have_ to look at facts.  While you certainly have polls re: "which flavor ice cream do you like," the results are essentially meaningless, unless you extrapolate and control for variables.  Even then, all any poll can do, or any statistic value can give you, is probability.  Statistics cannot "prove" or "disprove" anything.  That's one thing I hate about exit polling: I always lie.  You can't control for that.

See, this is why I talked about getting your geek on...


----------



## Grunt (Apr 25, 2019)

I hate statistics....

I just Googled Childhood Obesity and Starvation rates. Guess what?

1 in 5 are starving and 1 in 5 are obese...yep, I hate statistics!


----------



## Box (Apr 25, 2019)

> 1 in 5 are starving and 1 in 5 are obese...yep, I hate statistics!



those statistics make plenty of sense - fat kids are always hungry


----------



## Grunt (Apr 25, 2019)

Box said:


> those statistics make plenty of sense - fat kids are always hungry



Beautiful perspective!


----------



## Brill (Apr 25, 2019)

Box said:


> Statistics are great when they address facts:
> 
> There are 24 blue sedans in the parking lot.
> There are 17 blue coupes in the parking lot.
> ...



“To the owner of the blue vehicle, it’s being towed.”


----------



## Dienekes (Apr 25, 2019)

Talk about getting your geek on: 

Data: One of those above polls mentioned likely voters. What in the hell makes a likely voter? Do we run a regression of voter participation rates per party and delineate between all demographics across multiple categories in order to determine a usable confidence interval for each one simply for the amount of statistically appropriate amount of people we should use. Should we use simple or stratified random sampling to create the most unbiased data for our given needs? Or do we just include a question in the poll "Do you plan on voting in the next election?" and throw out the no's? What if there are many flavors of Democrats and Republicans? What if most of the nonresponders are of a singular party and we have an omission bias in the data? How big is that error? Would things change if our sample had an even number of rural and urban participants because god knows rural people never get polled?

Model: Are our questions unbiased enough? Are we asking the right questions? Do we use a likert scale and if so what is the scale? What is our margin of error in how we expect these questions to go? Then, they should be tested for validity, but they never are in practice because that costs a lot of money so we go with best guess. Then, how do we structure the model and account for statistical significance in our predictions? Are the residuals homogenous or heterogenous? What if our model isn't exactly significant but we mainly write about the descriptive statistics and make our predictions off of that? Have we tested and controlled for endogeneity?

The point is the people who make these polls follow 1 of 2 paths: 1) Either they don't ask themselves a lot of these questions because they are trying to make money, not write a dissertation and they make their best educated guesses or 2) They actually have all of the reasons why this poll should have a larger margin of error than what the data states, but this never ever makes it in an article because well, that doesn't generate clicks. The people that work for these places know all of this and far more. I don't know how rigorously they examine every statistical aspect, but you can be damn sure that an unpaid intern did as much work of the data entry/manipulation as possible, and you'll never hear about the weaknesses of a given poll in an article, and they rarely publish their data for replication (some places do, but they are rarely the ones that are cited in mainstream media).


----------



## SpongeBob*24 (Apr 25, 2019)

Grunt said:


> I hate statistics....
> 
> I just Googled Childhood Obesity and Starvation rates. Guess what?
> 
> 1 in 5 are starving and 1 in 5 are obese...yep, I hate statistics!



....and 2 in 5 dont have internet to answer the poll.


----------



## Brill (Apr 26, 2019)

SpongeBob*24 said:


> ....and 2 in 5 dont have internet to answer the poll.



Red State blues!!!


----------



## BlackCloud (Apr 26, 2019)

Ah yes…"lies, damn lies and statistics"


----------



## Brill (Apr 26, 2019)

Salt USMC said:


> FiveThirtyEight gave Trump a 30% chance of winning on election night - the highest percentage of any major statistical outfit.  They're legit.



But he won 100% of the required electoral votes. Legit?

If a fecal protection system, which boasted of a 30% success rate of stopping thrown matter, was recently installed in front of the monkey cage at the zoo, would you still visit said monkey exhibit?

A 30% chance of successfully choosing the winner between TWO candidates “is legit”? Seems a coin would have better odds.

Disclaimer: I don’t math.

The Coin Flip: A Fundamentally Unfair Proposition?


----------

