Statistics for Dummies (11 page)

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies

13.64Mb size Format: txt, pdf, ePub

Median

The median is another way to measure the center of a numerical data set (besides the good old standby, the average). A statistical median is much like the median of an interstate highway. On a highway, the median is the middle of the road, and an equal number of lanes lay on either side of the median. In a numerical data set, the
median
is the point at which there are an equal number of data points whose values lie above and below the median value. Thus, the median is truly the middle of the data set. See
Chapter 5
for more on the median.

REMEMBER

The next time you hear an average reported, look to see whether the median is also reported. If not, ask for it! The average and the median are two different representations of the middle of a data set and can often give two very different stories about the data.

Standard deviation

Have you heard anyone report that a certain result was found to be "2 standard deviations above the mean"? More and more, people want to report how significant their results are, and the number of standard deviations above or below average is one way to do it. But exactly what
is
a standard deviation?

The
standard deviation
is a way statisticians use to measure the amount of variability (or spread) among the numbers in a data set. As the term implies, a standard deviation is a standard (or typical) amount of deviation (or distance) from the average (or mean, as statisticians like to call it). So, the standard deviation, in very rough terms, is the average distance from the mean. See
Chapter 5
for calculations and more information.

The standard deviation is also used to describe where most of the data should fall, in a relative sense, compared to the average. For example, in many cases, about 95% of the data will lie within two standard deviations of the mean. (This result is called the empirical rule. See
Chapter 8
for more on this.)

TECHNICAL STUFF

The formula for standard deviation(
s
) is as follows:

where
n
= the number of values in the data set
x
= the average of all the values
x
= each value in the data set

For detailed instructions on calculating the standard deviation, see
Chapter 5
.

HEADS UP

The standard deviation is an important statistic, but it is often absent when statistical results are reported. Without it, you're getting only part of the story about the data. Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other foot in a bucket of boiling water. He said that, on average, he felt just great! But think about the variability in the two temperatures for each of his feet. Closer to home, the average house price, for example, tells you nothing about the range of house prices you may encounter when house-hunting. The average salary may not fully represent what's really going on in your company, if the salaries are extremely spread out.

REMEMBER

Don't be satisfied with finding out only the average — be sure to ask for the standard deviation, as well. Without a standard deviation, you have no way of knowing how spread out the values may be. (If you're talking starting salaries, for example, this could be very important!)

Percentile

You've probably heard references to percentiles before. If you've taken any kind of standardized test, you know that when your score was reported, it was presented to you with a measure of where you stood, compared to the other people who took the test. This comparison measure was most likely reported to you in terms of a percentile. The
percentile
reported for a given score is the percentage of values in the data set that fall below that certain score. For example, if your score was reported to be at the 90th percentile, that means that 90% of the other people who took the test with you scored lower than you did (and 10% scored higher than you did). For more specifics on percentiles, see
Chapter 5
.

REMEMBER

Percentiles are used in a variety of ways for comparison purposes and to determine
relative standing
(that is, how an individual data value compares to the rest of the group). Babies' weights are often reported in terms of percentiles, for example. Percentiles are also used by companies to get a handle on where they stand compared to other companies in terms of sales, profits, customer satisfaction, and so on.

Standard score

The standard score is a slick way to put results in perspective without having to provide a lot of details — something that the media loves. The
standard score
represents the number of standard deviations above or below the mean (without caring what that standard deviation or mean actually are).

As an example, suppose Bob took his statewide 10th-grade test recently, and scored 400. What does that mean? It may not mean much to you because you can't put that 400 into perspective. But knowing that Bob's standard score on the test is +2 tells you everything. It tells you that Bob's score is 2 standard deviations above the mean. (Bravo, Bob!) Now suppose Bill's standard score is
−
2. In this case, this is not good (for Bill), because it means Bill's score is 2 standard deviations
below
the mean.

The formula for standard score is

where

x
is the average of all the scores
s
is the standard deviation of all the scores

For the details on calculating and interpreting standard scores, see
Chapter 8
.

Normal distribution (or bell-shaped curve)

When numerical data are organized, they're often ordered from smallest to largest, broken into reasonably sized groups, then put into graphs and charts to examine the shape, or distribution, of the data. The most common type of data distribution is called the
bell-shaped curve
, in which most of the data are centered around the average in a big lump, and as you move farther out on either side of the mean, you find fewer and fewer data points.
Figure 3-1
shows a picture of a bell-shaped curve; notice that the shape of the curve resembles the outline of an old-fashioned bell.

Figure 3-1:
Bell-shaped curve.

Statisticians have another name for the bell-shaped curve when many possible values for the data exist; they call it the
normal distribution.
This distribution is used to describe data that follow a bell-shaped pattern, including what the range of values is expected to be and where an individual score stands in relation to the others. For example, if the data have a normal distribution, you can expect most of the data to lie within two standard deviations of the mean. Because every distinct population of data has a different mean and standard deviation, an infinite number of different normal distributions exist, each with its own mean and its own standard deviation to characterize it. See
Chapter 8
for plenty more information on the normal distribution.

TECHNICAL STUFF

The normal distribution is also used to help measure the accuracy of many statistics, including the mean, using an important result in statistics called the
central limit theorem.
This theorem gives you the ability to measure how much your sample mean will vary, without having to take any other sample means to compare it with (thankfully!). It basically says that your sample mean has a normal distribution, no matter what the distribution of the original data looks like (as long as your sample size was large enough). See
Chapter 9
for more on the central limit theorem (known by statisticians as the "crown jewel of all statistics." Should you even bother to tell them to get a life?).

HEADS UP

If a data set has a normal distribution, and you standardize all of the data to obtain standard scores, those standard scores are called Z-values. Z-values have what is known as a standard normal distribution (or Z-distribution). The
standard normal distribution
is a special normal distribution with a mean equal to 0 and a standard deviation equal to 1. The standard normal distribution is useful for examining the data and determining statistics like percentiles, or the percentage of the data falling between two values. So if researchers determine that the data have a normal distribution, they will usually first standardize the data (by converting each data point into a Z-value), and then use the standard normal distribution to explore and discuss the data in more detail.

Experiments

An
experiment
is a study that imposes a certain amount of control on the study's subjects and their environment (for example, restricting their diets, giving them certain dosage levels of a drug or placebo, or asking them to stay awake for a prescribed period of time). The purpose of most experiments is to pinpoint a cause-and-effect relationship between two variables (such as alcohol consumption and impaired vision). Here are some of the questions that experiments try to answer:

Does taking zinc help reduce the duration of a cold? Some studies show that it does.
Does the shape and position of your pillow affect how well you sleep at night? The Emory Spine Center in Atlanta says, "Yes."
Does shoe heel height affect foot comfort? A study done at UCLA says up to one inch heels are better than flat soles.

In this section, you find more information about how experimental studies are (or should be) conducted. And
Chapter 17
is entirely dedicated to the subject. For now, just concentrate on the basic lingo relating to experiments.

Treatment group versus control group

Most experiments try to determine whether some type of treatment (or important factor) has some sort of effect on an outcome. For example, does zinc help to reduce the length of a cold? Subjects who are chosen to participate in the experiment are typically divided into two groups, a treatment group and a control group. The
treatment group
consists of those who receive the treatment that supposedly has an effect on the outcome (in this case, zinc). The
control group
consists of those who do not receive the treatment, or those who receive a standard, well-known treatment whose results will be compared with this new treatment (such as vitamin C, in the case of the zinc study).

Placebo

A
placebo
is a fake treatment, such as a sugar pill. It is often given to the members of the control group, so that they will not know whether they are taking the treatment (for example, zinc) or receiving no treatment at all. Placebos are given to the control group in order to control for a phenomena called the
placebo effect
, in which patients who receive any sort of perceived treatment by taking a pill (even though it's a sugar pill) report some sort of result, be it positive ("Yes, I feel better already") or negative ("Wow, I am
starting to feel a bit dizzy"), due to a psychological effect. Without a placebo, the researchers could not be certain that the results were due to the actual effect of the treatment, because some (or all) of the observed effect could have been due to the placebo effect.

Blind and double-blind

A
blind experiment
is one in which the subjects who are participating in the study are not aware of whether they're in the treatment group or the control group. In the zinc example, a placebo would be used that would look like the zinc pill, and patients would not be told which type of pill they were taking. A blind experiment attempts to eliminate any bias in what the study subjects might report.

A
double-blind experiment
controls for potential bias on the part of both the patients and the researchers. Neither the patients nor the researchers collecting the data know which subjects received the treatment and which ones didn't. A double-blind study is best, because even though researchers may claim to be unbiased, they often have a special interest in the results — otherwise they wouldn't be doing the study!

Other books

Surfacing the Rim (Piercing The Fold) by Kimball, Venessa

Cybernarc by Robert Cain

Me and Miranda Mullaly by Jake Gerhardt

Bad Luck and Trouble by Lee Child

Ghost Moon by Rebecca York

Marque and Reprisal by Elizabeth Moon

The Sleeping World by Gabrielle Lucille Fuentes

One to Go by Mike Pace

A Rose in Winter by Kathleen E. Woodiwiss

The Light and the Dark by C. P. Snow