The median is another way to measure the center of a numerical data set (besides the good old standby, the average). A statistical median is much like the median of an interstate highway. On a highway, the median is the middle of the road, and an equal number of lanes lay on either side of the median. In a numerical data set, the
median
is the point at which there are an equal number of data points whose values lie above and below the median value. Thus, the median is truly the middle of the data set. See
Chapter 5
for more on the median.
REMEMBER | The next time you hear an average reported, look to see whether the median is also reported. If not, ask for it! The average and the median are two different representations of the middle of a data set and can often give two very different stories about the data. |
Have you heard anyone report that a certain result was found to be "2 standard deviations above the mean"? More and more, people want to report how significant their results are, and the number of standard deviations above or below average is one way to do it. But exactly what
is
a standard deviation?
The
standard deviation
is a way statisticians use to measure the amount of variability (or spread) among the numbers in a data set. As the term implies, a standard deviation is a standard (or typical) amount of deviation (or distance) from the average (or mean, as statisticians like to call it). So, the standard deviation, in very rough terms, is the average distance from the mean. See
Chapter 5
for calculations and more information.
The standard deviation is also used to describe where most of the data should fall, in a relative sense, compared to the average. For example, in many cases, about 95% of the data will lie within two standard deviations of the mean. (This result is called the empirical rule. See
Chapter 8
for more on this.)
TECHNICAL STUFF | The formula for standard deviation(
|
For detailed instructions on calculating the standard deviation, see
Chapter 5
.
HEADS UP | The standard deviation is an important statistic, but it is often absent when statistical results are reported. Without it, you're getting only part of the story about the data. Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other foot in a bucket of boiling water. He said that, on average, he felt just great! But think about the variability in the two temperatures for each of his feet. Closer to home, the average house price, for example, tells you nothing about the range of house prices you may encounter when house-hunting. The average salary may not fully represent what's really going on in your company, if the salaries are extremely spread out. |
REMEMBER | Don't be satisfied with finding out only the average — be sure to ask for the standard deviation, as well. Without a standard deviation, you have no way of knowing how spread out the values may be. (If you're talking starting salaries, for example, this could be very important!) |
You've probably heard references to percentiles before. If you've taken any kind of standardized test, you know that when your score was reported, it was presented to you with a measure of where you stood, compared to the other people who took the test. This comparison measure was most likely reported to you in terms of a percentile. The
percentile
reported for a given score is the percentage of values in the data set that fall below that certain score. For example, if your score was reported to be at the 90th percentile, that means that 90% of the other people who took the test with you scored lower than you did (and 10% scored higher than you did). For more specifics on percentiles, see
Chapter 5
.
REMEMBER | Percentiles are used in a variety of ways for comparison purposes and to determine |
The standard score is a slick way to put results in perspective without having to provide a lot of details — something that the media loves. The
standard score
represents the number of standard deviations above or below the mean (without caring what that standard deviation or mean actually are).
As an example, suppose Bob took his statewide 10th-grade test recently, and scored 400. What does that mean? It may not mean much to you because you can't put that 400 into perspective. But knowing that Bob's standard score on the test is +2 tells you everything. It tells you that Bob's score is 2 standard deviations above the mean. (Bravo, Bob!) Now suppose Bill's standard score is
−
2. In this case, this is not good (for Bill), because it means Bill's score is 2 standard deviations
below
the mean.
The formula for standard score is
where
x
is the average of all the scores
s
is the standard deviation of all the scores
For the details on calculating and interpreting standard scores, see
Chapter 8
.
When numerical data are organized, they're often ordered from smallest to largest, broken into reasonably sized groups, then put into graphs and charts to examine the shape, or distribution, of the data. The most common type of data distribution is called the
bell-shaped curve
, in which most of the data are centered around the average in a big lump, and as you move farther out on either side of the mean, you find fewer and fewer data points.
Figure 3-1
shows a picture of a bell-shaped curve; notice that the shape of the curve resembles the outline of an old-fashioned bell.
Statisticians have another name for the bell-shaped curve when many possible values for the data exist; they call it the
normal distribution.
This distribution is used to describe data that follow a bell-shaped pattern, including what the range of values is expected to be and where an individual score stands in relation to the others. For example, if the data have a normal distribution, you can expect most of the data to lie within two standard deviations of the mean. Because every distinct population of data has a different mean and standard deviation, an infinite number of different normal distributions exist, each with its own mean and its own standard deviation to characterize it. See
Chapter 8
for plenty more information on the normal distribution.
TECHNICAL STUFF | The normal distribution is also used to help measure the accuracy of many statistics, including the mean, using an important result in statistics called the |
HEADS UP | If a data set has a normal distribution, and you standardize all of the data to obtain standard scores, those standard scores are called Z-values. Z-values have what is known as a standard normal distribution (or Z-distribution). The |
An
experiment
is a study that imposes a certain amount of control on the study's subjects and their environment (for example, restricting their diets, giving them certain dosage levels of a drug or placebo, or asking them to stay awake for a prescribed period of time). The purpose of most experiments is to pinpoint a cause-and-effect relationship between two variables (such as alcohol consumption and impaired vision). Here are some of the questions that experiments try to answer:
Does taking zinc help reduce the duration of a cold? Some studies show that it does.
Does the shape and position of your pillow affect how well you sleep at night? The Emory Spine Center in Atlanta says, "Yes."
Does shoe heel height affect foot comfort? A study done at UCLA says up to one inch heels are better than flat soles.
In this section, you find more information about how experimental studies are (or should be) conducted. And
Chapter 17
is entirely dedicated to the subject. For now, just concentrate on the basic lingo relating to experiments.
Most experiments try to determine whether some type of treatment (or important factor) has some sort of effect on an outcome. For example, does zinc help to reduce the length of a cold? Subjects who are chosen to participate in the experiment are typically divided into two groups, a treatment group and a control group. The
treatment group
consists of those who receive the treatment that supposedly has an effect on the outcome (in this case, zinc). The
control group
consists of those who do not receive the treatment, or those who receive a standard, well-known treatment whose results will be compared with this new treatment (such as vitamin C, in the case of the zinc study).
A
placebo
is a fake treatment, such as a sugar pill. It is often given to the members of the control group, so that they will not know whether they are taking the treatment (for example, zinc) or receiving no treatment at all. Placebos are given to the control group in order to control for a phenomena called the
placebo effect
, in which patients who receive any sort of perceived treatment by taking a pill (even though it's a sugar pill) report some sort of result, be it positive ("Yes, I feel better already") or negative ("Wow, I am
starting to feel a bit dizzy"), due to a psychological effect. Without a placebo, the researchers could not be certain that the results were due to the actual effect of the treatment, because some (or all) of the observed effect could have been due to the placebo effect.
A
blind experiment
is one in which the subjects who are participating in the study are not aware of whether they're in the treatment group or the control group. In the zinc example, a placebo would be used that would look like the zinc pill, and patients would not be told which type of pill they were taking. A blind experiment attempts to eliminate any bias in what the study subjects might report.
A
double-blind experiment
controls for potential bias on the part of both the patients and the researchers. Neither the patients nor the researchers collecting the data know which subjects received the treatment and which ones didn't. A double-blind study is best, because even though researchers may claim to be unbiased, they often have a special interest in the results — otherwise they wouldn't be doing the study!