Yes, I know statistics isn’t the most glamorous or attention-grabbing topic—especially if you are interested in spectating rabid partisans hurling verbal feces at one another. But it is far more important for understanding reality and bypassing partisanship.
It isn’t just that you need to know the individual specific stats themselves, but you need to know of the relationships surrounding those numbers, and you need to know what claims those numbers can and cannot justify. If someone tells you a stat that at first glance sounds scary, you need to know if it really is that scary with all things considered. If you see a generalization made using a stat, you need to know whether that generalization is reasonable given the stat.
Groups Not Individuals
The vast majority of important statistics you will see deal with groups. This distinction is extremely important because people tend to think anecdotes or exceptions disprove overarching trends when they don’t. If a stat says 20% of people will get food poisoning from eating at a certain restaurant, it is completely meaningless for you to say “I ate there and I was fine” because it was already expected that most people would be fine.
Margin of Error
If a stat, for example a poll, says 55% of people favor Hillary Clinton over Trump (+/- 3%), but a poll by someone else resulted in 57%, or another one resulted in 53%, that doesn’t disprove the first one—they are all comparable results. They aren’t “fake news” or an indication of manipulation because they are different numbers.
Despite numbers varying within the margin of error, they are extremely useful for giving a picture often very close to reality, especially if properly done.
And yes, margins of error can be calculated highly accurately based on the variability in a sample. Statisticians (real ones from academia, not political pundits) are not trying to get anything over on you. They tell you up front that their stats will have a margin of error.
Is a 1,000 or 2,000 Person Survey Really a Large Enough Sample?
Mathematically speaking, 1,000+ person samples are more than capable of giving an extremely accurate picture of the general population.
Assuming a sample was chosen perfectly and each individual had equal probability of being chosen in the sample, analyzing the sampling distribution and the sample’s variation/variance can give a near exact margin of error. With a 1,000-person sample, your margin of error is +/- 3.2% with a 95% confidence level. That is pretty good. With a sample of 2,000 that shrinks to 2.2%.
Of course, that is with a perfectly random sample. Most samples aren’t perfectly random because certain demographics are more likely to have phones and answer them, answer their e-mails, or respond to whatever other surveying/polling medium the researcher uses in a given study. This raises the potential margin of error, but that too can be dealt with by compensating statistically.
And no, that isn’t surreptitious language for “fudging the numbers.” Mountains of peer-reviewed scientific papers have been published showing the efficacy of various statistical formulas for compensating for various extraneous variables. Scientists choose the ones best suited for the accuracy of the particular study they are conducting. It is as simple as that; no need to posit a conspiratorial effort by scientists just because you can’t understand the math.
People react drastically different to a stat saying “30% of people will die from X” versus one that says “70% will survive from X” despite them conveying the exact same information. It is a known phenomenon in psychology that people react more immediately and viscerally to information framed from the perspective of risk compared to that of reward.
Because of this, you can often be coaxed into overcompensating and agreeing with a policy which offers very little real benefit compared to unintentional consequences; everything from the Patriot Act to Muslim bans. Demagogues like the current president use this tactic of appealing to negative affect to fear-monger against Mexican immigrants, Muslims, and other groups that in reality post little real threat. The Left does the same thing with nuclear power generation.
Correlation Versus Causation
You may have heard this many times before, but it remains true. There are comedic lampoons about this showing Nicolas Cage movies correlating with the number of people drowning in a pool, or cheese consumption correlating with dying from getting tangled in your bed sheets.
Most false correlations are of course less obvious, and our decision whether or not to accept a correlation as proof of causation is often dictated purely by what is most convenient to our existing world-views.
Correlation is a necessary-yet-insufficient criterion for making a causation claim. In general, there are 3 criteria that must be met to make a causal claim:
- Temporal precedence; the thing you are claiming is the cause must be proven to actually precede what you are claiming is the result.
- There must be a correlation. Keep in mind, sometimes a correlation may not be apparent until you account for other variables. For example, some climate models show less correlation between Earth’s temperature and CO2 concentrations until you take the Sun’s output into consideration—after which the correlation is nearly perfect.
- This is probably the most important of the three: Rule out all or most of the alternate possible causes. This is probably the most heinously neglected step by partisans, politicians, and pundits. But only for their pet beliefs. If the other side makes a claim—rather than settling for ruling out primarily the most likely alternate explanations—partisans set the bar unreasonably high, expecting the other side to rule out each and every other possibility no matter how unlikely.
Graphs Can Be Used to Manipulate, but Those Graphs Can Be Identified by People Who Understand Stats
Often, in order to avoid accepting legitimate evidence, a partisan will reject a graph by claiming that graphs can be made to say anything. This of course is usually only a selective skepticism which they don’t apply to graphs that support their points of view.
In reality, graphs are extremely useful for visualizing and putting data into perspective.
The problem is that most laypersons don’t even have a modicum of understanding regarding the principles of displaying statistics graphically. If they did, they would rarely be fooled by bogus graphs; they would always look for cited sources for the data, look for truncated scales, and other telltale signs that indicate the veracity or deceptiveness of a graph.
It is reasonable to be skeptical about statistics, but after a certainly point, skepticism can pass into denial; knowledge about the basic concepts of stats can help you differentiate between the two. A firm understanding of stats can help you avoid being blindly partisan, and it helps you build a shield against being duped by partisans misusing stats.
I am not a master statistician. I am a graduate student in psychology who has used statistics in practical research, and I have taken graduate level Advanced Statistics and Philosophy of Science, both of which highly informed this post. It is quite possible, however, that there may be minor errors in terminology. Despite that, I’m highly confident the vast majority aligns with what any professor in statistics you tell you.