Statistics for Dummies (53 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
7.41Mb size Format: txt, pdf, ePub

 

Misinterpreted Correlations

The statistical definition of
correlation
is the strength and direction of the linear relationship between two numerical variables. In other words, the amount by which one numerical variable (for example, weight) is expected to
increase or decrease if another numerical variable (for example, height) is allowed to increase/decrease. Correlation is one of the most misunderstood and misused statistical terms used by researchers, the media, and the general public. Three important points about correlation are as follows:

  • A correlation can't apply to two
    categorical variables
    ,
    such as political party and gender. Instead, correlation applies only to two
    numerical variables
    ,
    such as height and weight.
    So, if you hear someone say, "It appears that the voting pattern is correlated with gender", you know that's incorrect. Voting pattern and gender may be associated, but they can't be correlated, according to the statistical definition of correlation.

  • A correlation measures the strength and direction of the
    linear
    relationship between two numerical variables.
    In other words, if you collect data on two numerical variables (such as height and weight) and plot all of the points on a graph, when a correlation exists, you should be able to draw a straight line through those points (uphill or downhill), and the line should fit pretty well. If a line doesn't fit, the variables aren't correlated. However, that doesn't necessarily mean the variables aren't related. They may have some other type of relationship; they just don't have a linear relationship. For example, bacteria multiply at an
    exponential
    rate over time (their numbers explode, doubling faster and faster), not at a
    linear
    rate (which would be a steady increase over time).

  • Correlation doesn't automatically mean cause and effect.
    For example, suppose someone reports that more people who drink diet soda have brain tumors than people who don't. If you're a diet soda drinker, don't panic just yet. This may be a freak of nature, and someone happened to notice it. At most, it means more research needs to be done (beyond observation) in order to show diet soda
    causes
    brain tumors.

 

Confounding Variables

A
confounding variable
is a variable that wasn't included in a study that may influence the results of the study, creating a confusing (that is, confounding) effect. For example, suppose a researcher tries to say that eating seaweed helps you live longer, and when you examine the study further, you find out that it was based on a sample of people who regularly eat seaweed in their diets and are over the age of 100. Suppose you then read interviews of these people and discover some of their other secrets to long life (besides eating seaweed): They ate very healthy foods, slept an average of 8 hours a day, drank a lot of water, and exercised every day. So did the seaweed cause them to live longer? Maybe, but you can't tell, because the confounding variables (exercise, water consumption, diet, and sleeping patterns) could also have caused longer life.

A common error in research studies is to fail to control for confounding variables, leaving the results open to scrutiny (and you want to be among those doing the scrutinizing)! The best way to control for confounding variables is to do a well-designed
experiment
, which involves setting up two groups that are alike in as many ways as possible, except that one group (called the
treatment group
) takes a treatment, and the other takes a
placebo
(a fake treatment; this group is called the
control group
). You then compare the results from the two groups, attributing any significant differences to the treatment (and to nothing else, in an ideal world).

The seaweed study wasn't a designed experiment; it was an
observational study.
In observational studies, no control for any variables exists; people are merely observed, and information is recorded.

HEADS UP 

Whenever you're looking at the results of a study that's claiming to show a cause-and-effect relationship or significant differences between groups, check to see whether the study was a designed experiment and whether confounding variables were controlled for. If doing an experiment is unethical (for example, showing smoking causes lung cancer by forcing half of the subjects in the experiment to smoke 10 packs a day for 20 years, while the other half of the subjects smoke nothing) you must rely on mounting evidence from many observational studies that cover many different situations, all leading to the same result.

REMEMBER 

Observational studies are great for surveys and polls, but not for showing cause-and-effect relationships, because they don't control for confounding variables. A well-designed experiment provides much stronger evidence.

 

Botched Numbers

Just because a statistic appears in the media doesn't mean it's correct. In fact, errors appear all the time (by error or by design), so stay on the lookout for them. Here are some tips for spotting botched numbers:

  • Make sure everything adds up to what it's reported to.
    With pie charts, be sure all the percentages add up to 100%.

  • Double check even the most basic of calculations.
    For example, a chart says 83% of Americans are in favor of an issue, but the report says 7 out of every 8 Americans are in favor of the issue. Are these the same? (No, 7 divided by 8 is 87.5%; 5 out of 6 is about 83%.)

  • Look for the response rate of a survey; don't just be happy with the number of participants.
    (The
    response rate
    is the number of people
    who responded divided by the total number of people surveyed times 100%.) If the response rate is much lower than 70%, the results could be biased, because you don't know what the non-respondents would have said.

  • Question the type of statistic used, to determine whether it's appropriate.
    For example, the number of crimes went up, but so did the population size. The researchers should have reported the
    crime rate
    (number of crimes per capita), instead.

REMEMBER 

Statistics are based on formulas and calculations that don't know any better — the people plugging in the numbers should know better, though, and they either don't know better or they don't want you to catch on. You, as a consumer of information (also known as a certified skeptic), must be the one to take action. The best policy is to ask questions.

 

Selectively Reporting Results

Another bad scenario is when a researcher reports his one
statistically significant result
(a result that was very unlikely to have occurred simply by chance), but leaves out the part where he actually conducted hundreds of other, unreported tests, all of which were
not
significant. If you had known about all of the other tests, you may have wondered whether this one that was significant was meaningful, or if, indeed, it was simply due to chance because of the large number of tests performed. This is what statisticians like to call
data snooping
or
data fishing.

How do you protect yourself against misleading results due to data fishing? Find out more details about the study: how many tests were done, how many results weren't significant, and what was found to be significant. In other words, get the whole story if you can, so that you can put the significant results into perspective.

REMEMBER 

To spot fudged numbers and errors of omission, the best policy is to remember that if something looks too good to be true, it probably is. Don't just go on the first result that you hear, especially if it makes big news. Wait to see whether others can verify and replicate the result.

 

The Almighty Anecdote

Ah, the anecdote — one of the strongest influences on public opinion and behavior ever created. And one of the least statistical. An
anecdote
is a story based on a single person's experience or situation. For example:

  • The waitress who won the lottery

  • The cat that learned how to ride a bicycle

  • The woman who lost a hundred pounds in two days on the miracle potato diet

  • The celebrity who claims to have used an over-the-counter hair color for which she is a spokesperson (yeah, right)

Anecdotes make great news; the more sensational the better. But sensational stories are outliers from the norm of life. They don't happen to most people.

You may think you're out of reach of the influence of anecdotes. But what about those times when you let one person's experience influence you? Your neighbor loves his Internet Service Provider, so you try it, too. Your friend had a bad experience with a certain brand of car, so you don't bother to testdrive it. Your dad knows somebody who died in a car crash because they were trapped in the car by their seat belt, so he decides never to wear his.

While some decisions are okay to make based on anecdotes, some of the more important decisions you make should be based on real statistics and real data that come from well-designed studies and careful research.

HEADS UP 

An anecdote is a data set with a sample size of only one. You have no information with which to compare the story, no statistics to analyze, no possible explanations or information to go on — just a single story. Don't let anecdotes have much influence over you. Instead, rely on scientific studies and statistical information based on large random samples of individuals who represent their target populations (not just a single situation).

REMEMBER 

The best thing to do when someone tries to persuade you by telling you an anecdote is to respond by saying, "Show me the data!"

 

Sources

This appendix contains the sources that I use in my examples throughout this book. Because you're a budding statistical detective, you may want to follow up on some of these to get more information that leads you to more informed decisions.

Chapter 1

All newspaper articles mentioned were taken from
The Cincinnati Enquirer
and
The Columbus Dispatch
, January 26, 2003.

 

Chapter 2

Microwaving leftovers survey:
USA Today
, September 6, 2001.

Trident Gum Web site:
http://www.tridentgum.com/consumer/html/c0000.html
.

Number of crimes in the U.S. from 1990–1998, taken from the FBI Uniform Crime Reports:
http://www.fbi.gov/ucr/ucr.htm
.

Kansas Lottery Web site:
http://www.kslottery.com
.

Good bedside manner can fend off malpractice suits:
USA Today
, February 19, 1997.

Ross Perot's survey:
TV Guide
, March, 21, 1993; for more information, contact United We Stand America at:
http://www.uwsa.com
.

Journal of the American Medical Association:
http://jama.ama-assn.org
.

The New England Journal of Medicine:
http://content.nejm.org
.

The Lancet:
http://www.thelancet.com
.

British Medical Journal:
http://bmj.com
.

The Gallup Organization:
http://www.gallup.com
.

 

Chapter 3

The Gallup Organization:
http://www.gallup.com
.

U.S. Census Bureau:
http://www.census.gov
.

Zinc for colds; pillow position and sleep; shoe heel height and feet comfort: these studies are referenced in "Healthy Habits — that Aren't",
Woman's Day
, February 11, 2003.

Cricket chirps and temperature: "Cricket thermometers",
Field & Stream
, July 1993, Vol. 98, Issue 3, p. 21; for data, see:
The Songs of the Insects
(1949), by George W. Pierce, Harvard University Press, pp. 12–21.

Crimes and police, U.S. Dept. of Justice:
http://www.ojp.usdoj.gov/bjs/lawenf.htm
.

Ice cream and murders (New York City): a good article to start investigating this subject and related issues is Spellman, B. A., & Mandel, D. R. (2003). For more on the psychology of causal reasoning, see Nadel, L. (Ed.)
Encyclopedia of Cognitive Science
(Vol. 1, pp. 461–466).

 

Chapter 4

Consumer Expenditure Survey, Bureau of Labor Statistics:
http://www.bls.gov/cex
.

Lotteries: (Ohio)
http://www.ohiolottery.com
; (Florida)
http://www.flalottery.com/lottery/edu/edu.html
; (Michigan)
http://www.michigan.gov/lottery
; (New York)
http://www.nylottery.org
.

Tax dollar pizza:
http://www.irs.gov/app/cgi-bin/slices.cgi
.

Population/racial/workforce trends, U.S. Department of Labor, Herman Report: "Futurework: Trends and Challenges for Work in the 21st Century":
http://www.dol.gov/asp/programs/history/herman/reports/futurework
.

Transportation expenses, Bureau of Transportation Statistics:
http://www.bts.gov/publications/transportation_in_the_united_states/pdf/teconomy.pdf
.

Birth statistics, Colorado Department of Public Health and Environment:
http://www.cdphe.state.co.us/../cohid/birthdata.html
.

Internal Revenue Service:
http://www.irs.gov
.

Occupation employment and wage estimates, Bureau of Labor Statistics:
http://www.wa.gov/esd/lmea/occdata/oeswage/Page2067.htm
.

Population estimates by state, U.S. Census Bureau:
http://eire.census.gov/popest/data/states/tables/ST-EST2002-01.php
.

Other books

Undeniable (The Druids Book 1) by S. A. Archer, S. Ravynheart
The Girl Who Was on Fire by Leah Wilson, Diana Peterfreund, Jennifer Lynn Barnes, Terri Clark, Carrie Ryan, Blythe Woolston
The Grimm Chronicles, Vol. 2 by Ken Brosky, Isabella Fontaine, Dagny Holt, Chris Smith, Lioudmila Perry
Brother Fish by Bryce Courtenay
Heart and Home by Jennifer Melzer
Sidelined by Kyra Lennon
Wild Girl by Patricia Reilly Giff