Statistics Essentials For Dummies (26 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies
11.99Mb size Format: txt, pdf, ePub

results in a narrower confidence interval.

Here's where a large sample size really comes in handy. When you need a high level of confidence, you have to increase the
z*
-value and, hence, the margin of error. This makes your confidence interval wider (not good). But you can offset this wider confidence interval by increasing the sample size and bringing the margin of error back down, thus narrowing the confidence interval. The increase in sample size allows you to still have the confidence level you want, but also ensures that the width of your confidence interval will be small (which is what you ultimately want).

You can determine the sample size you need to achieve a certain margin of error before you start a study. When estimating a population mean, you can use the following sample size

formula:
, where
MOE
is your desired margin

of error;
is the population standard deviation; and
z*
is the value on the
Z-
distribution that corresponds to the confidence level you want (Table 7-1).

Notice that the bracket notation on the outside of the equation for
n
has a flat ledge on top and no ledge on the bottom. That means you are supposed to round up your result to the "next greatest integer." In other words, always round up your answer to the next integer if you have anything after the decimal point — even 107.01 is rounded up to 108. This ensures that you won't exceed the margin of error you need.

If the population standard deviation,
is unknown, you can do a
pilot
study (a small study before the full blown study) and use its sample standard deviation (
s
) as a substitute for
. At that point you would use the appropriate value on the
t-
distribution with
n
- 1 degrees of freedom, rather than
z*
. (See Chapter 9 for info on the
t-
distribution.)

When your statistic is a sample proportion or percentage (such as the proportion of females, or the percentage of semis) a quick-and-dirty way to figure margin of error is to take 1 divided by the square root of
n
(the sample size). Try different values of
n
and see how the margin of error is affected.

Approximately what sample size is needed to have a narrow confidence interval with respect to polls? Using the formula in the preceding paragraph, you can make some quick comparisons. A survey of 100 people will have a margin of error of about

= 0.10 or plus or minus 10% (which is fairly large.) However,

if you survey 1,000 people, your margin of error decreases

dramatically, to plus or minus
, or about 3%. A survey

of 2,500 people in the U.S. results in a margin of error of plus or minus 2%. This sample size gives amazing accuracy when you think about how large the U.S. population is (well over 300 million).

Keep in mind, however, that you don't want to go too high with your sample size because there is a point where you start having a diminished return. For example, moving from a sample size of 2,500 to 5,000 narrows the margin of error of the confidence interval to about 1.4%, down from 2%. Each time you survey one more person, the cost of your survey in terms of money and time increases, so adding another 2,500 people to the survey just to narrow the interval by less than six tenths of 1% may not be worthwhile.

Real accuracy depends on the quality of the data as well as on the sample size. A large sample size that has a great deal of bias (see Chapter 12) may appear to have a narrow confidence interval but actually means nothing. It's better to have a smaller sample size that contains good data than a larger sample size with a lot of bias.

Counting On Population Variability

Another factor influencing variability in sample results is the variability (standard deviation) within the population itself. For example, in a population of houses in a large city like Columbus, Ohio, you see a large amount of variability in price. This variability in house price over the whole city will be higher than the variability in house price if your population was limited to a certain housing development in Columbus (where the houses are likely to be similar to each other).

As a result, if you take a sample of houses from the entire city of Columbus and find the average price, the margin of error will be larger than if you take a sample from one single housing development in Columbus. So you'll need to sample more houses from the entire city of Columbus in order to have the same amount of accuracy that you would get from a single housing development.

You can also look at it mathematically. Variability is measured in terms of standard errors/deviations. Notice that the population standard deviation,
appears in the numerator of the standard error of the sample mean,
. As
(numerator)

Other books

Santa Baby by Katie Price
The Lost Queen by Frewin Jones
Surviving the Mob by Dennis Griffin
Speak of the Devil by Allison Leotta