Statistics for Dummies (31 page)

Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies
3.27Mb size Format: txt, pdf, ePub

 

Spotting Misleading Confidence Intervals

When data come from well-designed surveys or experiments (see
Chapters 16
and
17
) and are based on large random samples (see
Chapter 9
), you can feel good about the quality of the information. When the margin of error is small, relatively speaking, you would like to say that these confidence intervals provide accurate and credible estimates of their parameters. This is not always the case, however.

HEADS UP 

Not all estimates are as accurate and reliable as some may want you to think. For example, a Web site survey result based on 20,000 hits may have a small margin of error according to the formula, but the margin of error means nothing if the survey is only given to people who happened to visit that Web site. In other words, the sample isn't even close to being a random sample (where everyone in the population has an equal chance of being chosen to participate). Nevertheless, such results do get reported, along with their margins of error that make the study seem truly scientific. Beware of these bogus results! (See
Chapter 10
for more on the limits of the margin of error.)

REMEMBER 

Before making any decisions based on someone's estimate, do the following:

  • Investigate how the statistic was created; it should be the result of a scientific process that results in reliable, unbiased, accurate data. (See
    Chapters 2
    and
    3
    .)

  • Look for a margin of error. If one isn't reported, go to the original source and request it.

  • Remember that if the statistic isn't reliable or contains bias, the margin of error will be meaningless. (See
    Chapter 16
    for avoiding bias in survey data and see
    Chapter 17
    for criteria for good data in experiments.)

 

Chapter 12:
Calculating Accurate Confidence Intervals

Aconfidence interval
is a fancy phrase for a statistic that has a margin of error to go along with it (see
Chapter 11
for a basic overview of confidence intervals; see
Chapter 10
for information on margin of error). Because most statistics are calculated for the purpose of estimating characteristics of a population (called
parameters
) from a sample, every statistic should include a margin of error as a measure of its level of accuracy. After all (as
Chapter 9
says), when you're taking samples, results vary!

In this chapter, you find out how to calculate your own confidence interval. You also get the lowdown on some of the finer points of confidence intervals: what makes them narrow or wide, what makes you more or less confident in their results, and what they do and don't measure. With this information, you know what to look for when being presented statistical results, and you understand how to gauge the true accuracy of those results.

Calculating a Confidence Interval

A confidence interval is composed of a statistic, plus or minus a margin of error (see
Chapter 10
). For example, suppose you want to know the percentage of vehicles in the United States that are pickup trucks (that's the parameter, in this case). You can't look at every single vehicle in the United States, so you take a random sample of 1,000 vehicles over a range of highways at different
times of the day. You find that 7% of the vehicles in your sample are pickup trucks. Now, you don't want to say that
exactly
7% of all vehicles on U.S. roads are pickup trucks, because you know this is only based on the 1,000 vehicles you sampled. While you hope that 7% is close to the true percentage, you can't be sure because you based your results on a sample of vehicles, not on all of the vehicles in the United States.

So what to do? You add and subtract a margin of error to indicate how much error you expect your sample result to have. (See
Chapter 10
for more on the margin of error.) The error isn't due to anything you did wrong, it simply comes from the fact that a
sample
(a study of a portion of the population), not a
census
(a study of the entire population), was done.

The
width
of your confidence interval is two times the margin of error. For example, suppose the margin of error was 5%. A confidence interval of 7%, plus or minus 5%, goes from 7%

5% = 2%, all the way up to 7% + 5% = 12%. That means it has a width of 12%

2% = 10%. A simpler way to calculate this is to say that the width of the confidence interval is two times the margin of error. In this case, the width of the confidence interval is 2 × 5% = 10%.

REMEMBER 

The width of a confidence interval is the distance from the lower end of the interval (statistic

margin of error) to the upper end of the interval (statistic + margin of error). You can always calculate the width of a confidence interval quickly by taking 2 times the margin of error.

The following are the general steps for estimating a parameter with a confidence interval, along with references where you can find more detailed information on how to accomplish each step.

  1. Choose your confidence level and your sample size (see
    Chapter 9
    ).

  2. Select a random sample of individuals from the population (see
    Chapter 3
    ).

  3. Collect reliable and relevant data from the individuals in the sample.

    See
    Chapter 16
    for survey data and
    Chapter 17
    for data from experiments.

  4. Summarize the data into a statistic, usually a mean or proportion (see
    Chapter 5
    ).

  5. Calculate the margin of error (see
    Chapter 10
    ).

  6. Take the statistic plus or minus the margin of error to get your final estimate of the parameter.

    This is called a
    confidence interval
    for that parameter.

Teen attitudes toward smokeless tobacco

An ongoing study conducted by the University of Michigan monitors teenagers' attitudes on a number of issues, including the perceived risk of smokeless tobacco (commonly called chewing tobacco). The study shows that more of today's teenagers perceive smokeless tobacco to cause great risk compared to 15 years ago. Their results are the following.

  • In a sample in 2001 of 2,100 twelfth graders, 45.4% perceived smokeless tobacco to cause a great risk for harm. The margin of error was plus or minus 2%.

    A 95% confidence interval for the percentage of
    all
    twelfth graders who perceive smokeless tobacco to be of great risk is therefore 45.4% ± 2%.

  • Based on a sample of 3,000 twelfth graders in 1986, the confidence interval for all twelfth graders who felt that smokeless tobacco caused a great risk was 25.8% ± 1.6%.

 

Choosing a Confidence Level

Notice that the example showing teen attitudes towards tobacco (see the "Teens attitude toward smokeless tobacco" sidebar) includes the phrase a "95% confidence interval." Every confidence interval (and every margin of error, for that matter) has a confidence level associated with it. In that example, the confidence level was 95%. A confidence level helps you account for the other possible sample results you could have gotten, when you're making an estimate of a parameter using the data from only one sample. If you want to account for 95% of the other possible results, your confidence level would be 95%.

Variability in sample results is measured in terms of number of standard errors. A
standard error
is similar to the standard deviation of a data set, only a standard error applies to sample means or sample percentages that you could have gotten, if different samples were taken. (See
Chapter 10
for information on standard errors.) Every confidence level has a corresponding number of standard errors that have to be added or subtracted. This number of standard errors is called the Z-value (because it corresponds to the standard normal distribution). See
Table 10-1
in
Chapter 10
.

What level of confidence level is typically used by researchers? I've seen confidence levels ranging from 80% to 99%. The most common confidence level is 95%. In fact, statisticians have a saying that goes, "Why do statisticians like their jobs? Because they have to be correct only 95% of the time." (Tacky, but sort of catchy, isn't it?)

Being 95% confident means that if you take many, many samples, and calculate a confidence interval each time, based on the results, 95% of those samples will result in confidence intervals that are right on target and actually contain the true parameter. In order to have a 95% confidence level, the empirical rule says that you need to add and subtract "about" 2 standard errors. The central limit theorem allows you to be more exact about the amount, so the "about 2" actually becomes 1.96. See
Table 10-1
in
Chapter 10
for selected confidence levels and their corresponding Z-values.

If you want to be more than 95% confident about your results, you need to add and subtract more than two standard errors. For example, to be 99% confident, you would add and subtract about three standard errors to obtain your margin of error. The higher the confidence level, the larger the Z-value, the larger the margin of error, and the wider the confidence interval (assuming everything else stays the same). You have to pay a certain price for more confidence.

HEADS UP 

Note I said "assuming everything else stays the same." You can offset an increase in the margin of error by increasing the sample size. See the "
Factoring in the Sample Size
" section for more on this.

 

Zooming In on Width

The ultimate goal when making an estimate using a confidence interval is to have the confidence interval be narrow. That means you're zooming in on what the parameter is. Having to add and subtract a large amount only makes your result much less accurate. For example, suppose you're trying to estimate the percentage of semi trucks on the interstate between the hours of 12 a.m. and 6 a.m., and you come up with a 95% confidence interval that claims the percentage of semis is 50%, plus or minus 50%. Wow, that narrows it down! (Not.) You've defeated the purpose of trying to come up with a good estimate.

In this case, the confidence interval is much too wide. You'd rather say something like this: A 95% confidence interval for the percentage of semis on the interstate between 12 a.m. and 6 a.m. is 50%, plus or minus 3%. This would require a much larger sample size, but that would be worthwhile.

So, if a small margin of error is good, is smaller even better? Not always. To get an extremely narrow confidence interval, you have to conduct a much more difficult — and expensive — study, so a point comes where the increase in price doesn't justify the marginal difference in accuracy. Most people are pretty comfortable with a margin of error of 2% to 3% when the estimate itself is a percentage (like the percentage of women, Republicans, or smokers).

REMEMBER 

A narrow confidence interval is a good thing.

How do you go about ensuring that your confidence interval will be narrow enough? You certainly want to think about this issue before collecting your data; after the data are collected, the width of the confidence interval is set.

Three factors affect the width of a confidence interval:

  • The confidence level (as discussed in the preceding section)

  • The sample size

  • The amount of variability in the population

The formula for the margin of error associated with a sample mean is
, where:

  • Z is the value from the standard normal distribution corresponding to the confidence level (see
    Table 10-1
    in
    Chapter 10
    ).

  • n
    is the sample size (see
    Chapter 9
    ).

  • is the standard error of the sample mean. (See
    Chapter 10
    for more on standard error.)

A confidence interval for the average would then be
x
plus or minus the margin of error.
Chapter 13
gives the formulas for the most common confidence intervals you're likely to come across.

Each of these three factors (confidence level, sample size, and population variability) plays an important role in influencing the width of a confidence interval. You've already seen the effects of confidence level. In the following section, you explore how sample size and population variability affect the width of a confidence interval.

HEADS UP 

Note that the sample statistic itself (for example, 7% of vehicles in the sample are pickup trucks) isn't related to the width of the confidence interval. Instead, the margin of error and the three factors involved in it are totally responsible for determining the width of a confidence interval.

Other books

Nefertiti by Nick Drake
Counterpoint by John Day
The Treasure by Iris Johansen
Beggars and Choosers by Catrin Collier
Night Rider by Tamara Knowles
Perfecting Fiona by Beaton, M.C.
The Necromancer's Nephew by Andrew Hunter
The Might Have Been by Joe Schuster