Statistics for Dummies (21 page)

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

BOOK: Statistics for Dummies

2.47Mb size Format: txt, pdf, ePub

Models and simulations

Not all probabilities can be calculated using math. In cases when math won't work to calculate a probability, other methods can be used to estimate probabilities or to use known probabilities to make predictions about the world. For example, complicated computer models are used to predict the probability of a hurricane hitting the U.S. coast, and if so, when and where. These computer models are based on data from the behavior of past hurricanes, as well as current weather conditions and other variables. Scientists put
the information into a sophisticated mathematical model that tries to predict what the hurricane will do. Work remains to be done in this area, but progress is being made all the time. Models like this would save lives, property, and millions of dollars in damage if people were able to know ahead of time what to expect and prepare accordingly.

Other models are based on observational data. The U.S. Census Bureau's American Community Survey surveyed households in Columbus, Ohio, in 2001 to get an idea of the makeup of the community. One of the characteristics examined was household composition (married couples, other families, people who live alone, and other non-family households). The data are summarized in
Figure 6-1
. These statistics from a
sample
of households can serve as a probability model for the makeup of
all
households in Columbus, Ohio, in 2001.

Figure 6-1:
Household make up for Columbus, Ohio, 2001.

For example, because 35% of the sampled households were married-couple households, you can say that the probability that a randomly selected household in Columbus is a married-couple household is 35%. You can also use the rules of probability to make other statements about the households in Columbus in 2001. For example, what's the probability that a randomly selected house contains any type of family? That would be the sum of the probabilities of selecting a married couple household (35%) and a household that falls into the "other family" category (20%). So, the probability that a randomly selected household in Columbus, Ohio, in 2001 contains a family is 35% + 20% = 55%. (Therefore, the probability of selecting a non-family household is 100%
−
55% or 45%.)

HEADS UP

The probability model in
Figure 6-1
shouldn't be used for other communities outside of Columbus, Ohio, because in this survey, the sample of households was selected only from Columbus. Using these data to discuss a population other than the one from which the sample was drawn would be invalid. (See
Chapter 16
for more on surveys and what they can and can't say about populations.)

Simulations are another way to estimate probability when a formula isn't possible. In a
simulation
, a process is repeated over and over again under the exact same conditions (usually using a computer), and the outcomes are recorded each time. The probability of any outcome is estimated by the percentage of times the outcome occurred in the simulations. For example, a sports fan with too much time on his hands simulated thousands of NCAA tournaments on his computer and used these simulations to predict that Duke would win the NCAA basketball championship in 2002 with a probability of over 95%. As luck would have it, the prediction turned out to be wrong (Duke was eliminated early in the tournament) proving that the only thing you can be certain about is uncertainty.

Interpreting Probability

A probability can be interpreted in two ways: as a short-term chance, or as a long-term percentage. In the short term, the probability of an event is the percentage chance that the event is going to happen on the next try. For example, your meteorologist may tell you that the probability of rain tomorrow is 40%. Or a baseball player's batting overall average is 0.291 (meaning he has on average a 29.1% chance of getting a hit the next time he comes up to bat).

Probability also means the percentage of times that an event will happen in the long run (over a long period of time with repeated trials under the same conditions). So a 40% chance of rain tomorrow can be taken to mean that if you look at data from a large number of days similar to the type of day tomorrow is supposed to be, it rained on 40% of those days. The baseball player's 0.291 batting average can be interpreted as the proportion of times that he gets a hit averaged over many times at bat (in this case he's expected to hit the ball 291 times out of 1,000 times at bat).

Avoiding Probability Misconceptions

The basic rules of probability seem pretty straightforward, but probability can often be counterintuitive. This section gives you some of the more common misconceptions people have about probability.

Looking more likely

If you were to write down a sequence of what you would think the outcomes of six flips of a
fair coin
(that is, a coin that hasn't been tampered with) would look like, you probably wouldn't think of writing down something like
HTTTTH (where "H" means "heads" and "T" means "tails"), because that doesn't look very "random." However, this exact sequence of heads and tails has the same chance of happening as any other exact sequence. That's because the probability of getting a head is the same as the probability of getting a tail on each individual toss. Now, if you were to compare the probability of getting two heads (out of six tosses) with the probability of getting six heads (out of six tosses), you'd get different values. The probability of getting two heads (out of six tosses) is higher because you have more ways of accomplishing it, instead of having to get a head every single time.

HEADS UP

With the lottery, the sequence 1, 2, 3, 4, 5, 6 has the same chance of winning as any other combination of six numbers, even though it doesn't look like it can
ever
occur. This fact makes you realize that all of the other combinations are just as
unlikely
to be chosen as this combination is. However, if you bet on this combo and win, you probably won't have to split the winnings with anyone.

Predicting long or short term

Probability works well for predicting long-term behavior, but it doesn't work well for predicting outcomes in the short term. In the long-term, you know that unless the event has a probability of 0, it will happen sometime, and depending on how big the probability is, you can even get some idea of how long you can expect to wait. But you won't know exactly
when
the event will happen. That's what makes probability so interesting and what keeps gamblers coming back over and over again.

For example, if I flip a fair coin six times and get six heads in a row, what do you think the outcome of the next flip should be, a head or a tail? You may think I'm due to get a tail, so getting a tail now should have a higher chance of happening. But in fact, the probability of getting a tail on the next flip is still 1/2, the same as it was for each of the previous flips. You know that if the coin were flipped a large number of times that you can expect about 50% of the outcomes to be heads and 50% to be tails. But you can't predict
when
those heads or tails will appear on any given flip of the coin. (So even though it seems like a tail is due, the probability of getting a head or a tail
on this next flip
is still 50%.) Eventually, tails will start coming up, but you can't say when.

Thinking 50-50

One common misconception is to think that every situation with two possible outcomes is a "50-50" situation (in other words, a 50% chance that each of the two outcomes will occur, just as it was with the outcome of a single toss of a fair coin). Many people think that just because two outcomes are possible,
each outcome must have a one out of two chance of occurring, but that's very often not the case. Not every situation is like a fair coin toss. Many situations have a higher probability of one outcome over the other.

For example, think of a computerized "walk/don't walk" sign on the crosswalk of a busy street. Is that sign going to say "walk" exactly 50% of the time? No. When the street is busy, the light will stop traffic less often, and pedestrians will have to wait longer between opportunities to cross the street. Using a sports example, think of a basketball player standing at the free throw line. Are her chances of making the basket 50-50? (After all, she either makes it or she doesn't.) Her chances are 50-50 only if her overall free throw percentage is 50% over many tries. Most likely, it's something higher than that.

Interpreting rare events

Probability can become a topic of controversy, especially in the case of rare events. A rare event has a small probability of happening, but what does that mean? It means that for any single situation or person, the event is unlikely to occur, yet if given enough repetitions of the situation over a long enough period of time or with enough people, the event is bound to happen to somebody, somewhere, sometime. This comes into play in situations where you have a cluster of people with a rare disease in one town, and you need to figure out whether something caused this to happen (the air, the water, the soil, and so on) or whether this just occurred by chance (something most people don't consider).

Because it doesn't seem very likely that a rare event would actually occur, people naturally want to blame the occurrence on something. In some cases, they'd be right; in other cases, this is just a phenomenon of random chance. Do three years in a row of rising average temperatures indicate global warming? If a dairy farm had two cows both give birth to two-headed calves in the same season, does that mean their cows have a terrible problem? How many tire blowouts should it take to constitute a tire recall? Looking at something after the fact and saying, "What was the chance of that happening here?" is different than before the fact knowing that the same event is bound to happen somewhere, sometime.

For example, if you flip a fair coin long enough, eventually, you come across a long string of heads, just by chance. That's supposed to happen sometime. And because the coin was fair, you couldn't blame it on anything but chance. However, the media may try to establish a pattern when they see two or more occurrences of an event, such as child abductions across the country, nightclub fires, or occurrences of a rare disease in the same city. I'm not saying
these shouldn't be investigated for possible causal problems, but I am saying that the media needs to be aware that sometimes, events just happen in clumps by chance, with no big story behind it. It's also interesting to note that people view the probability of a rare event differently depending on whether the rare event is a good thing, like winning the lottery ("It's got to happen to somebody, so it may as well be me!") or a bad thing, like getting struck by lightning at a golf tournament ("That's a million-to-one shot. That can never happen to me!"). This may just be human nature.
Mental note:
Human nature doesn't correspond to the laws of probability.

REMEMBER

To avoid some of the more common probability misconceptions, keep the following in mind:

Probability isn't effective in predicting short-term behavior. It is effective when predicting long-term behavior.
In the case where only two outcomes are possible, each outcome doesn't necessarily have a 50% chance of occurring.
If a cluster of rare events occurs somewhere, it may have happened due to chance and no other reason. Rare events are going to happen to somebody, somewhere, sometime, given enough people and time.
You can't be "on a roll" if a process is being repeated over and over under the same conditions (as in a gaming situation). Probability has no memory.
Sequences of outcomes that "look more random" oftentimes have the same probability as sequences that don't "look as random." For example, you may think that HTTTTH has a smaller chance of occurring than HTTTHT does because it doesn't "look as random." In fact, they each have the same probability of occurring, because each of the outcomes contains four tails and two heads (and the order doesn't matter when calculating the probability here).

Connecting Probability with Statistics

You may be thinking, "Probability is interesting, but what does it have to do with statistics?" Good question. It may not seem obvious, but probability and statistics fit together like a hand in a glove. Data are collected from a sample of individuals, and then statistics are calculated to summarize those data. But you don't stop there. The next step is to use these statistics to make some sort of prediction, generalization, conclusion, or decision about the population that this sample came from. That's where probability comes in.

Estimating

Data are often collected in order to help estimate population proportions or averages. For example, doctors estimate the chance of someone having a heart attack by first gathering information about a patient's weight, body mass index, age, gender, genetic background, diet, exercise level, and so on. Then they compare the information to data that have been collected from a sample of people who have similar characteristics to the patient, and they come up with the patient's probability (or risk level) of having a heart attack in a given period of time. Engineers estimate the average number of cars that will be using a certain section of the interstate highway during rush hour by recording traffic data using technology in the pavement. After the data are collected, probability is used to determine how much the sample information is likely to vary from sample to sample, day to day, hour to hour, and so on.

Predicting

Statistics are involved in helping to make predictions of all kinds — everything from weather predictions and population size projections to the spread of disease or the future values of the stock market. Data are collected over a period of time and are analyzed to find a model that not only fits the data well, but also allows for some predictions to be made for the near future. Probability helps people using the models to assess how accurate those predictions are expected to be, given the data at hand. Probability also helps scientists determine what the most likely scenario is, given the data.

For example, the U.S. Census Bureau makes available its population projections for the total U.S. resident population. You can currently look at population projections all the way to the year 2100. In 2000, the projected population for 2003 was 282,798,000, and as of May 2003, the population of the United States (as shown on the U.S. Census Bureau's Web site) was already 291,065,455 and counting. So, the projection was already off by about 8.3 million people with much of the year left, but that's only 2.8% of the total population in this situation. Estimating the total population size of the United States in the future is a difficult job. It's hard enough to count the number of people living here right now! (By the way, according to Census Bureau, the size of the U.S. resident population projected for the year 2100 is 570,954,000.)

Deciding

Many decision-making processes involve statistics and probability. Medical treatments are often decided in terms of the percentage of people that did well using that treatment, compared to others. The probability that the next
person will do well on a treatment would be estimated from the percentage of other patients who did well on that treatment. Most liability forms you need to sign before having surgery outline the possible side effects or complications and give some indication of how often those happen. (See
Chapter 17
for more on medical studies.)

Checking quality

Other decisions that involve probability occur during manufacturing processes. Many companies that manufacture items do some sort of quality control; that is, they sample products that come off of the line and assess product quality according to some set of specifications. Probability is used to decide whether and when the manufacturer needs to stop the process due to a problem with the quality of the products. Differences between the sampled product and the specifications may just be due to random variability or an unrepresentative sample; alternatively, these differences can mean that something is wrong with the process. Stopping the process unnecessarily costs money and time, but not stopping the process when it needs to be stopped costs the company in terms of customer satisfaction with the product. So probability is used to make some pretty important decisions in the manufacturing world. (See
Chapter 19
for more on quality control.)

REMEMBER

When generalizing results from a sample to a population, probability is used to assess the accuracy of those generalizations. Probability is used to determine which conclusion is most likely and why. When making a decision about a situation with an unknown outcome, you use probability to assess the evidence that has been collected, to make a choice based on that assessment, and to know the chance that you made the right or wrong decision. (See
Chapter 14
for more information.)

Other books

Love Letters of the Angels of Death by Jennifer Quist

Winterlands 4 - Dragonstar by Hambly, Barbara

Falling by Amber Jaeger

Thorn Abbey by Ohlin, Nancy

FMR by SL

Rocky Mountain Rebel by Vivian Arend

MURDER TUNED IN (Allie Griffin Mysteries Book 4) by Leslie Leigh

Charles Palliser by The Quincunx

Hurt (The Hurt Series) by Reeves, D.B.

The Proposition (The Plus One Chronicles) by Lyon, Jennifer