Statistics Essentials For Dummies (45 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies
13.38Mb size Format: txt, pdf, ePub

The slope,
m
, for the best-fitting line for the subset of cricket

chirp versus temperature data is
m
=
=
= 0.90.

So, as the number of chirps increases by 1 chirp per 15 seconds, the temperature is expected to increase by 0.90 degrees Fahrenheit on average. To get a more practical interpretation, you can multiply the top and bottom of the slope by 10 to get 9.0/10 and say that as chirps increase by 10 (per 15 seconds), temperature increases 9 degrees Fahrenheit.

Now, to find the
y
-intercept,
b
, you take
-
m

, or 67 - (0.90)∗ (26.5) = 43.15. So the best-fitting line for predicting temperature from cricket chirps based on the data is
y
= 0.90
x
+ 43.15, or temperature (in degrees Fahrenheit) = 0.90 ∗ (number of chirps in 15 seconds) + 43.15. The
y
-intercept would try to predict temperature when there is no chirping going on at all. However, no data was collected at or near this point, so we can't make predictions for temperature in this area. You can't predict temperature using crickets if the crickets are silent.

Making Predictions

After you have a strong linear relationship, and you find the equation of the best-fitting line
y
=
mx
+
b
, you use that line to predict
y
for a given
x
-value. This amounts to plugging the
x
-value into the equation and solving for
y
. For example, if your equation is
y
= 2
x
+ 1, and you want to predict
y
for
x
= 1, then plug 1 into the equation for
x
to get
y
= 2(1) + 1 = 3.

Remember that you choose the values of
X
(the explanatory variable) that you plug in; what you predict is
Y
, the response variable, which totally depends on
X
. By doing this, you are using one variable that you can easily collect data on, to predict a
Y
variable that is difficult or not possible to measure; this works well as long as
X
and
Y
are correlated. That's the big idea of regression.

From the previous section, the best-fitting line for the crickets is
y
= 0.90
x
+ 43.15. Say you're camping, listening to crickets, and you remember that you can predict temperature by counting chirps. You count 35 chirps in 15 seconds. You put in 35 for
x
and find
y
= 0.90(35) + 43.15 = 74.65 degrees F. (Yeah, you memorized the formula just in case you needed it.) So, because crickets chirped 35 times in 15 seconds, you figure the temperature is probably about 75 degrees Fahrenheit.

Avoid Extrapolation!

Just because you have a model doesn't mean you can plug in any value for
X
and do a good job of predicting
Y
. For example, in the chirping data, there is no data collected for less than 18 chirps or more than 39 chirps per 15 seconds (refer back to Table 10-1). If you try to make predictions outside this range you're going into uncharted territory; the farther outside this range you go with your
x
-values, the more dubious your predictions for
y
will get. Who's to say the line still works outside of the area where data were collected? Do you think crickets will chirp faster and faster without limit? At some point they would either pass out or burn up!

Making predictions using
x
-values that fall outside the range of your data is a no-no. Statisticians call this
extrapolation;
watch for researchers who try to make claims beyond the range of their data.

Correlation Doesn't Necessarily Mean Cause-and-Effect

Scatterplots and correlations identify and quantify relationships between two variables. However, if a scatterplot shows a definite pattern, and the data are found to have a strong correlation, that doesn't necessarily mean that a cause-and-effect relationship exists between the two variables. A
cause-and-effect relationship
is one where a change in
X
causes a change in
Y
. (In other words, the change in
Y
is not only associated with a change in
X
, it is directly caused by
X
.)

For example, suppose a well-controlled medical experiment is conducted to determine the effects of dosage of a certain drug on blood pressure. (See a total breakdown of experiments in Chapter 13.) The researchers look at their scatterplot and see a definite downhill linear pattern; they calculate the correlation and it's strong. They conclude that increasing the dosage of this drug causes a decrease in blood pressure. This cause-and-effect conclusion is okay because they controlled for other variables that could affect blood pressure in their experiment, such as other drugs taken, age, general health, and so on.

However, if you made a scatterplot and examined the correlation between ice cream consumption versus murder rates, you would also see a strong linear relationship (this time uphill.) Yet no one would claim that more ice cream consumption causes more murders to occur.

What's going on here? In the drug example, the data were collected through a well-controlled medical experiment, which minimizes the influence of other factors that might affect blood pressure changes. In the second example, the data were just based on observation, and no other factors were examined. It turns out that this strong relationship exists because increases in murder rates and ice cream sales are both related to increases in temperature. (Temperature in this case is called a
confounding variable
; it affects both
X
and
Y
but was not included in the study — see Chapter 13.)

Whether two variables are found to be causally associated depends on how the study was conducted. Only a well-designed experiment (see Chapter 13) or a large collection of several different observational studies can show enough evidence for cause-and-effect.

Other books

Calico Bride by Jillian Hart
Stay by Paige Prince
Trouble with Kings by Smith, Sherwood
Sheik Down by Mia Watts
The Grunts In Trouble by Philip Ardagh