Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

Statistics for Dummies (38 page)

BOOK: Statistics for Dummies
6.52Mb size Format: txt, pdf, ePub
ads

 

Walking through a Hypothesis Test: The Big Picture

Every hypothesis test contains a series of steps and procedures. This section gives you a general breakdown of what's involved. See
Chapter 15
for details on the most commonly used hypothesis tests, including tests that examine a claim about a single population parameter, as well as those that compare two populations.

Reviewing the general steps for a hypothesis test (one means/proportions, large samples)

Here's a boiled-down summary of the calculations involved in doing a hypothesis test. (Particular formulas needed to find test statistics for any of the most common hypothesis tests are provided in
Chapter 15
.)

  1. Set up the null and alternative hypotheses:

    1. The null hypothesis, H
      o
      , says that the population parameter is equal to some claimed number.

    2. Three possible alternative hypotheses exist; choose the one that's most relevant in the case where the data
      don't
      support H
      o
      .

      1. H
        a
        : The population parameter is
        not equal (

        ) to
        the claimed number.

      2. H
        a
        : The population parameter is
        less than (<)
        the claimed number.

      3. H
        a
        : The population parameter is
        greater than (>)
        the claimed number.

  2. Take a random sample of individuals from the population and calculate the sample statistic.

    This gives your best estimate of the population parameter (see
    Chapter 4
    ).

  3. Convert the sample statistic to a test statistic by changing it to a standard score (all formulas for test statistics are provided in
    Chapter 15
    ):

    1. Take your sample statistic minus the number in the null hypothesis. This is the distance between the claim and your results.

    2. Divide that distance by the standard error of your statistic (see
      Chapter 10
      for more on standard error). This changes the distance to standard units.

  4. Find the
    p
    -value for your test statistic.

    1. Find the percentage chance of being at or beyond that value in the same direction:

      1. If H
        a
        contains a less-than alternative, find the percentile from
        Table 8-1
        in
        Chapter 8
        that corresponds to your test statistic.

      2. If H
        a
        contains a greater-than alternative, find the percentile from
        Table 8-1
        (see
        Chapter 8
        ) that corresponds to your test statistic, and then take 100% minus that percentile. (This gives you the percentage to the right of your test statistic.)

    2. Double this percentage if (and only if) H
      a
      is the not-equal-to alternative.

    3. Change the percentage to a probability by dividing by 100 or by moving the decimal point two places to the left. This is your
      p
      -value.

  5. Examine your
    p
    -value and make your decision.

    1. Smaller
      p
      -values show more evidence against H
      o
      . Conclude that H
      o
      is false (in other words, reject the claim).

    2. Larger
      p
      -values show more evidence for H
      o
      . Conclude that you can't reject H
      o
      . Your sample supports the claim.

      What's the cutoff point between having or not having enough support for H
      o
      ? Most people find 0.05 to be a good cutoff point for accepting or rejecting H
      o
      ;
      p
      -values less than 0.05 show reasonable doubt that H
      o
      is true. Your cutoff point is called the alpha (
      α
      ) level.

TECHNICAL STUFF 

In a case where two populations are being compared, most researchers are interested in comparing the groups according to some parameter, such as the average weight of males versus females, or the proportion of women who oppose an issue compared to the proportion of men. In this case, the hypotheses are set up so you're looking at the difference between the averages or proportions, and the null hypothesis is that the difference is zero (the groups have the same means or proportions).
Chapter 15
gives formulas and examples for these hypothesis tests for both the large and small sample size cases.

Dealing with other hypothesis tests

Many types of hypothesis tests are done in the world of research. The most common ones have been included in
Chapter 15
(along with easy-to-use formulas, step-by-step explanations, and examples). But so many types of tests exist and their results come to you on an everyday basis — many of them in sound bytes, press releases, evening news broadcasts, and on the Internet.

While the hypothesis tests that researchers use can be quite varied, the main ideas (such as
p
-values and how to interpret those results) are the same.

REMEMBER 

The most important element that all hypothesis tests have in common is the
p
-value. All
p
-values have the same interpretation, no matter what test is done. So anywhere you see a
p
-value, you will know that a small
p
-value means the researcher found a "statistically significant" result, which means the null hypothesis was rejected.

HEADS UP 

You also know, regardless of which hypothesis test someone used, that any conclusions that are made are subject to the process of data collection and analysis being done correctly. Even then, under the best of situations, the data could still be unrepresentative just by chance, or the truth could have been too hard to detect, and the wrong decision could be made. But that's part of what makes statistics such a fun subject — you never know if what you're doing is correct, but you always know that what you're doing is right; does that make sense?

Handling smaller samples: The
t
-distribution

For means/porportions, in the case where the sample size is small (and by small, I mean dropping below 30 or so), you have less information on which to base your conclusions. Another drawback is that you can't rely on the standard normal distribution (Z-distribution) to compare your test statistic, because the central limit theorem hasn't kicked in yet. (The central limit theorem requires sample sizes that are large enough for the results to average out to a bell-shaped curve; see
Chapter 8
for more on this.) You already know you should disregard results that are based on very small sample sizes (especially those with a sample size of 1). So, what do you do in those in-between situations, in which the sample size isn't small enough to disregard and isn't large enough to use the standard normal distribution to weigh your evidence? You use a different distribution, called a
t-distribution.
(You may have heard of the term
t-test
before, in terms of hypothesis testing. This is where that term comes from.)

The t-distribution is basically a shorter, fatter version of the standard normal distribution (Z-distribution). The idea is, you should have to pay a penalty for having less information, and that penalty is a distribution that has fatter tails. To make a touchdown (getting into that magic 5% range where H
o
is rejected) with a smaller sample size is going to mean having to go farther out, proving yourself more, and having stronger evidence than you normally would if you had a larger sample size.
Figure 14-2
compares the standard normal distribution (Z-distribution) to a t-distribution.

Figure 14-2:
Comparison of the standard normal (Z-) distribution and the t-distribution.

Each sample size has its own t-distribution. That's because the penalty for having a smaller sample size, like 5, is greater than the penalty for having a larger sample size, like 10 or 20. Smaller sample sizes have shorter, fatter t-distributions than the larger sample sizes. And as you may expect, the larger the sample size is, the more the t-distribution looks like a standard normal distribution (Z-distribution); and the point where they become very similar (similar enough for jazz or government work) is about the point where the sample size is 30.
Figure 14-3
shows what different t-distributions look like for different sample sizes and how they all compare to the standard normal distribution (Z-distribution).

Figure 14-3:
t-distributions for different sample sizes.
TECHNICAL STUFF 

Each t-distribution is distinguished by something statisticians call
degrees of freedom.
(Why they call it that is something that goes beyond this book.) When you're testing one population's mean and the sample size is
n
, the degrees of freedom for the corresponding t-distribution is
n

1. So, for example, if your sample size is 10, you use a t-distribution with 10

1 or 9 degrees of freedom,
denoted t
9
, rather than a Z-distribution, to look up your test statistic. (For any test that uses the t-distribution, the degrees of freedom will be given in terms of a formula involving the sample sizes. See
Chapter 15
for details.)

The t-distribution makes you pay a penalty for having a small sample size. What's the penalty? A larger
p
-value than one that the standard normal distribution would have given you for the same test statistic. That's because of the fatter tails on the t-distribution; a test statistic far out on the leaner Z-distribution has little area beyond it. But that same test statistic out on the fatter t-distribution has more fat (or area) beyond it, and that's exactly what the
p
-value represents. A bigger
p
-value means less chance of rejecting H
o
. Having less data should create a higher burden of proof, so
p
-values do work the way you'd expect them to, after you figure out what you expect them to do!

Because each sample size would have to have its own t-distribution with its own t-table to find
p
-values, statisticians have come up with one abbreviated table that you can use to get a general feeling for your results (see
Table 14-2
). Computers can also give you a precise
p
-value for any sample size.

Table 14-2:
t-Distribution

Degrees of Freedom

90th Percentile

95th Percentile

97.5th Percentile

98th Percentile

99th Percentile

1

3.078

6.314

12.706

31.821

63.657

2

1.886

2.920

4.303

6.965

9.925

3

1.638

2.353

3.182

4.541

5.841

4

1.533

2.132

2.776

3.747

4.604

5

1.476

2.015

2.571

3.365

4.032

6

1.440

1.943

2.447

3.143

3.707

7

1.415

1.895

2.365

2.998

3.499

8

1.397

1.860

2.306

2.896

3.355

9

1.383

1.833

2.262

2.821

3.250

10

1.372

1.812

2.228

2.764

3.169

11

1.363

1.796

2.201

2.718

3.106

12

1.356

1.782

2.179

2.681

3.055

13

1.350

1.771

2.160

2.650

3.012

14

1.345

1.761

2.145

2.624

2.977

15

1.341

1.753

2.131

2.602

2.947

16

1.337

1.746

2.120

2.583

2.921

17

1.333

1.740

2.110

2.567

2.898

18

1.330

1.734

2.101

2.552

2.878

19

1.328

1.729

2.093

2.539

2.861

20

1.325

1.725

2.086

2.528

2.845

21

1.323

1.721

2.080

2.518

2.831

22

1.321

1.717

2.074

2.508

2.819

23

1.319

1.714

2.069

2.500

2.807

24

1.318

1.711

2.064

2.492

2.797

25

1.316

1.708

2.060

2.485

2.787

26

1.315

1.706

2.056

2.479

2.779

27

1.314

1.703

2.052

2.473

2.771

28

1.313

1.701

2.048

2.467

2.763

29

1.311

1.699

2.045

2.462

2.756

30

1.310

1.697

2.042

2.457

2.750

40

1.303

1.684

2.021

2.423

2.704

60

1.296

1.671

2.000

2.390

2.660

Z-values

1.282

1.645

1.960

2.326

2.576

BOOK: Statistics for Dummies
6.52Mb size Format: txt, pdf, ePub
ads

Other books

Anna of Byzantium by Tracy Barrett
Trickster by Steven Harper
Overheated by Laina Kenney
LusitanianStud by Francesca St. Claire
Tinker Bell and the Lost Treasure by Disney Digital Books
Only Son by Kevin O'Brien
The Born Queen by Greg Keyes
Exile (Keeper of the Lost Cities) by Messenger, Shannon