Read Statistics for Dummies Online

Authors: Deborah Jean Rumsey

Tags: #Non-Fiction, #Reference

Statistics for Dummies (47 page)

BOOK: Statistics for Dummies
10.46Mb size Format: txt, pdf, ePub
ads

 

Picturing the Relationship: Plots and Charts

An article in
Garden Gate
magazine caught my eye. "Count Cricket Chirps to Gauge Temperature", the title read. According to the article, all you have to do is find a cricket, count the number of times it chirps in 15 seconds, add 40, and voila! You've just predicted the temperature in degrees Fahrenheit.

The National Weather Service Forecast Office even puts out its own Cricket Chirp Converter. You enter the number of times the cricket chirps in 15 seconds, and the converter gives you the temperature estimated in four different units, including degrees Fahrenheit and Celsius.

A fair amount of research does support the claim that the frequency of cricket chirps is related to temperature. For illustration, I've taken a subset of some of the data and presented it in
Table 18-1
. Notice that each observation is composed of two variables that are tied together, in this case number of times the cricket chirped in 15 seconds, and the temperature at that time (in degrees Fahrenheit). Statisticians call this type of two-dimensional data
bivariate
data. Each observation contains one pair of data collected simultaneously.

Table 18-1:
Cricket Chirps and Temperature Data (Excerpt)

Number of Chirps in 15 Seconds

Temperature (in Degrees Fahrenheit)

18

57

20

60

21

64

23

65

27

68

30

71

34

74

39

77

A recent press release put out by the Ohio State University Medical Center also caught my attention. The headline says that aspirin can prevent polyps in colon cancer patients. Having had a close relative who succumbed to this disease, I was heartened at the prospect that researchers are making progress in this area and decided to look into it. The data from the aspirin versus colon polyps study are summarized in
Table 18-2
.

Table 18-2:
Summary of Aspirin versus Polyps Study Results

Group

% Developing Polyps
[
*
]

Aspirin

17

Non-aspirin (placebo)

27

[
*
]
total sample size = 635 (approximately half were randomly assigned to each group)

The raw data for this study contain 635 lines. Each line represents a person in the study and includes the person's identification number, the group to which he or she was assigned (aspirin or non-aspirin), and whether or not the subject developed polyps during the study period (yes or no). For example, line 1 of this data set may look like this:

ID#22292   GROUP = ASPIRIN   DEVELOPED POLYPS = NO

If you look at the raw bivariate data for this large data set, you'd probably have a hard time deducing any relationship between the variables: 635 lines of data would be hard for anyone (except a computer) to make any sense out of. For the cricket chirps versus temperature data, even if you can see a general pattern in the raw data (for example, noticing that as the number of chirps increases the temperature seems to also increase), the exact relationship is hard to pinpoint.

To make sense out of any data, you should first organize them using a table, chart, or graph (see
Chapter 4
). When the data are bivariate and you're looking for links between the two variables, the charts and graphs need to have two dimensions to them as well, just like the data do. That's the only way you can explore possible connections between the variables.

Displaying bivariate numerical data

In the case where both variables are quantitative (that is, numerical, such as measures of height and weight), the bivariate data are typically organized in a graph that statisticians call a
scatterplot
. A scatterplot has two dimensions, a horizontal dimension (called the
x
-axis) and a vertical dimension (called the
y
-axis). Both axes are numerical; each one contains a number line.

Making a scatterplot

Placing observations (or points) on a scatterplot is similar to finding a city on a map that uses letters and numbers to mark off sections of the map. Each observation has two coordinates; the first corresponds to the first piece of data in the pair (that's the
x
-coordinate, the amount that you go left or right). The second coordinate corresponds to the second piece of data in the pair (that's the
y
-coordinate, the amount that you go up or down). Intersect the two coordinates, and that's where you place the point representing that observation.
Figure 18-1
shows a scatterplot for the cricket chirps versus temperature data listed in
Table 18-1
. Because I put the data in order according to their
x
-values (number of chirps) when I generated
Table 18-1
, the points on the scatterplot (going from left to right) correspond in order to the observations given in
Table 18-1
.

Figure 18-1:
Scatterplot of cricket chirps versus outdoor temperature.
Interpreting a scatterplot

You interpret a scatterplot by looking for trends in the data as you go from left to right:

  • If the data resemble an uphill line as you move from left to right, this indicates a
    positive linear
    (or
    proportional
    )
    relationship
    . As
    x
    increases (moves right one unit),
    y
    increases (moves up) a certain amount.

  • If the data resemble a downhill line as you move from left to right, this indicates a
    negative linear
    (or
    inverse
    )
    relationship
    . That means that as
    x
    increases (moves right one unit),
    y
    decreases (moves down) a certain amount.

  • If the data don't seem to resemble any kind of line (even a vague one), this means that no linear relationship exists.

Looking at
Figure 18-1
, there does appear to be a positive linear relationship between number of cricket chirps and the temperature. That is, as the cricket chirps increase, the temperature increases, as well. Is this a cause-and-effect relationship (in other words, do temperature increases
cause
crickets to chirp faster)? That remains to be seen, because these data come only from an observational study, not an experiment (see
Chapter 17
).

HEADS UP 

Visual displays of bivariate data show possible associations or relationships between two variables. However, just because your graph or chart shows that something is going on, it doesn't mean that a cause-and-effect relationship exists. For example, if you look at a scatterplot of ice cream consumption and murder rates, those two variables have a positive linear relationship, too. Yet, no one would claim that ice cream consumption causes murders, or that murder rates affect ice cream consumption. If someone is trying to establish a cause-and-effect relationship by showing a chart or graph, dig deeper to find out how the study was designed and how the data were collected, and then evaluate the study appropriately using the criteria outlined in
Chapter 17
.

TECHNICAL STUFF 

Other types of trends may exist besides the uphill or downhill linear trends. Variables can be related to each other through a curved relationship or through various exponential relationships; these are beyond the scope of this book. The good news, however, is that many relationships can be characterized as uphill or downhill linear relationships.

Displaying bivariate categorical data

In the case where both variables are categorical (such as the gender of the respondent and whether the respondent does or does not support the president), the data are typically summarized in what statisticians call a
two-way table
(meaning a table that has rows representing the categories of the first variable and columns representing the categories of the second variable).

For example, in the aspirin versus colon polyps study, the two variables were categorical: whether or not the colon-cancer patient took aspirin (yes or no), and whether or not that person developed any more polyps (yes or no). Note that
Table 18-2
in the "
Picturing the Relationship: Plots and Charts
" section is a two-way table.

A more visually pleasing way to organize two-dimensional data is to use a bar graph or a series of pie charts.
Figure 18-2
shows a bar graph that indicates the percentage of patients who developed polyps for the aspirin group compared to the non-aspirin (placebo) group.
Figure 18-3
shows two pie charts, one for the aspirin group and one for the non-aspirin group. Each pie chart shows the percentage in that group that developed polyps.

Figure 18-2:
Bar graph showing the results of the polyps versus aspirin study.
Figure 18-3:
Pie charts showing the results of the polyps versus aspirin study.

Because the bars in the bar graph have very different sizes and because the two pie charts look quite different, a relationship does appear to exist between aspirin-taking and the development of a polyp among the subjects of this study. (The operative word here is "appears." Hypothesis tests for two proportions need to be done in order to be sure that these differences in the samples can safely apply to their respective populations. See
Chapter 15
for more on this.)

HEADS UP 

Be skeptical of anyone who draws conclusions about a relationship between two variables by only showing a chart or graph. Looks can be deceiving (see
Chapter 4
). Statistical measurements and tests should be done to show that such relationships are statistically meaningful (see
Chapter 14
).

Tip 

In the case of the aspirin versus polyp data, you may think that the second variable (polyps) is numerical because the value is represented by a percentage. But, in fact, that's not the case. Percentages are just a handy way to summarize the data from a categorical variable. In this case, the second variable is whether or not polyps developed (yes/no) for each patient. (This is a categorical variable.) The percentages simply summarize all of the patients in each yes and no category.

BOOK: Statistics for Dummies
10.46Mb size Format: txt, pdf, ePub
ads

Other books

My Name Is Mary Sutter by Robin Oliveira
The Boy from France by Hilary Freeman
My Lady Scandal by Kate Harper
Strange Images of Death by Barbara Cleverly
House of Earth by Woody Guthrie
Rexanne Becnel by The Heartbreaker
The First Mountain Man by William W. Johnstone
Here With Me by Heidi McLaughlin
Far-Fetched by Devin Johnston