Many graphs and charts contain misinformation, mislabeled information, or misleading information, or they simply lack important information that the reader needs to make critical decisions about what is being presented.
Figure 21-1
shows examples of four types of data displays: pie charts, bar graphs, time charts, and histograms. (Note that a
histogram
is basically a bar graph for numerical data.) For each type, I outline some of the most common ways that you can be misled. (For more information on charts and graphs, including misleading charts and graphs, see
Chapter 4
.)
Pie charts are exactly what they sound like: charts that are in the shape of a circle (or pie) and divided into slices that represent the percentage of individuals that fall into each group (according to some categorical variable, such as gender, political party, or employment status).
Here's how to sink your teeth into a pie chart and test it for quality:
Check to be sure the percentages add up to 100%, or close to it (any round-off error should be small).
Beware of slices of the pie called "other" that are larger than the rest of the slices. This means the pie chart is too vague.
Watch for distortions that come with the three-dimensional-looking pie charts, in which the slice closest to you looks larger than it really is because of the angle at which it's presented.
Look for a reported total number of individuals who make up the pie chart, so you can determine how big the pie was before it was divided up into the slices that you're looking at. If the size of the data set (the number of respondents) is too small, the information isn't reliable.
A bar graph is similar to a pie chart, except that instead of being in the shape of a circle that's divided up into slices, a bar graph represents each group as a bar, and the height of the bar represents the number or percentage of individuals in that group.
When examining a bar graph:
Consider the units being represented by the height of the bars and what the results mean in terms of those units. For example, total number of crimes verses the crime rate, which is measured by total number of crimes
per capita
(that is, per person).
Evaluate the appropriateness of the
scale
, or amount of space between units expressing the number in each group of the bar graph. For example, number of customer complaints for men versus women could be expressed by units of 1, 5, 10, or 100. Small scales (for example, going from 1 to 500 by 10s) make differences look bigger; large scales (for example, going from 1 to 500 by 100s) make differences look smaller.
A time chart shows how some measurable quantity changes over time (for example, stock prices, average household income, and average temperature).
Here are some issues to watch for with time charts:
Watch the scale on the vertical (quantity) axis as well as the horizontal (timeline) axis; results can be made to look more or less dramatic than they actually are by simply changing the scale.
Take into account the units being portrayed by the chart and be sure they are equitable for comparison over time; for example, are dollar amounts being adjusted for inflation?
Beware of people trying to explain why a trend is occurring without additional statistics to back themselves up. A time chart generally shows
what
is happening.
Why
it's happening is another story!
Watch for situations in which the time axis isn't marked with equally spaced jumps. This often happens when data are missing. For example, the time axis may have equal spacing between 1971, 1972, 1975, 1976, 1978, when it should actually show empty spaces for the years in which no data are available.
A histogram is a graph that breaks the sample into groups according to a numerical variable (such as age, height, weight, or income) and shows either the number of individuals (frequency) or the percentage of individuals (relative frequency) that fall into each group. For example, perhaps 20% of a sample was 20 years old or less; 30% was between 20 and 40 years old; 45% was between 40 and 60 years old; and 5% was over 60 years old (age being the numerical variable here).
Some items to watch for regarding histograms include the following:
Watch the scale used for the vertical (frequency/relative frequency) axis, looking especially for results that are exaggerated or played down through the use of inappropriate scales.
Check out the units on the vertical axis, whether they're reporting frequencies or relative frequencies; take that information into account when examining the information.
Look at the scale used for the groupings of the numerical variable on the horizontal axis. If the groups are based on small intervals (for example, 0–2, 2–4, and so on), the data may look overly volatile. If the groups are based on large intervals (for example, 0–100, 100–200, and so on), the data may give a smoother appearance than is realistic.
Bias
in statistics is the result of a systematic error that either overestimates or underestimates the true value. For example, if I use a ruler to measure plants and that ruler is 1/2-inch short, all of my results are biased; they're systematically lower than their true values.
Here are some of the most common sources of biased data:
Measurement instruments are systematically off.
For example, a police officer's radar gun says you were going 76 miles per hour but you
know
you were only going 72 miles per hour. Or a scale that always adds 5 pounds to your weight.
Participants are influenced by the data collection process.
For example, a survey question that asks, "Have you
ever
disagreed with the government?" will overestimate the percentage of people who are unhappy with the government.
Sample of individuals doesn't represent the population of interest.
For example, examining study habits by visiting only the campus library puts more emphasis on those who study the most. (See more on this in the "
Non-Random Samples
" section later in this chapter.)
Researchers aren't objective.
For example, suppose one group of patients is given a sugar pill, and the other group is given a real drug. The researchers can't know who got what treatment, because if they do, they may inadvertently project results onto the patients (such as saying, "You're feeling better, aren't you?") or pay more attention to those on the drug.
HEADS UP | To spot biased data, examine how the data were collected. Ask questions about the selection of the participants, how the study was conducted, what questions were used, what |
The word "error" has a somewhat negative connotation, as if an error is something that is always avoidable. In statistics, that's not always the case. For example, a certain amount of what statisticians call
sampling error
will always occur whenever someone tries to estimate a population value using anything other than the entire population. Just the act of selecting a sample from the population means you leave out certain individuals, and that means you're not going to get the precise, exact population value. No worries, though. Remember that statistics means never having to say you're certain — you have to only get close. And if the sample is large and is randomly selected, the sampling error will be small.
To evaluate a statistical result, you need a measure of its accuracy — typically through the margin of error. The
margin of error
tells you how much the researcher expects his or her results to vary from sample to sample. (For more information on margin of error, see
Chapter 10
.) When a researcher or
the media fail to report the margin of error, you're left to wonder about the accuracy of the results, or worse, you just assume that everything is fine, when in many cases, it's not. Survey results shown on TV used to only rarely include a margin of error, but now, you do often see that information reported. Still, many newspapers, magazines, and Internet surveys fail to report a margin of error, or they report a margin of error that is meaningless because the data are biased (see
Chapter 10
).
HEADS UP | When looking at statistical results in which a number is being estimated (for example, the percentage of people who think the president is doing a good job) always check for the margin of error. If it's not included, ask for it! (Or if given enough other pertinent information, you can calculate the margin of error yourself using the formulas in |
From survey data to results of medical studies, most statistics are based on data collected from samples of individuals, rather than on entire populations, because of cost and time considerations. Plus, you don't need a very big sample to be amazingly accurate,
if
the sample being studied is representative of the population. For example, a well-designed and well-conducted survey of 2,500 people has a margin of error of roughly plus or minus only 2% (see
Chapter 10
). For an experiment with a treatment group and a control group, statisticians would like to see at least 30 people in each group in order to have accurate data.
How can you ensure that the sample represents the population? The best way is to
randomly
select the individuals from the population. A random sample is a subset of the population selected in such a way that each member of the population has an equal chance of being selected (like drawing names out of a hat). No systematic favoritism or exclusion is involved in a random sample.
Many surveys and studies aren't based on random samples of individuals. For example, medical studies are often performed with volunteers that, obviously, volunteer — they're not randomly selected. It wouldn't be practical to phone people and say, "You were chosen at random to participate in our sleeping study. You'll need to come down to our lab and stay there for four nights." In this situation, the best you can do is study the volunteers and see how well they represent the population, and then report those results. You can also ask for certain types of volunteers.
Polls and surveys also need to be based on randomly selected individuals, and doing so is much easier than with medical studies. However, many surveys aren't based on random samples. For example, TV polls asking viewers to "call us with your opinion" aren't based on random samples. These surveys don't give the entire population an equal chance to be chosen (in fact, in these examples, the people choose themselves).
HEADS UP | Before making any decisions about statistical results from a survey or a study, look to see how the sample of individuals was selected. If the sample wasn't selected randomly, take the results with a grain of salt. |
The quantity of information is always important in terms of assessing how accurate a statistic will be. The more information that goes into a statistic, the more accurate that statistic will be — as long as that information isn't biased, of course (see the "
Biased Data
" section earlier in this chapter). The consumer of the statistical information needs to assess the accuracy of the information, and for that, you need to look at how the information was collected (see
Chapter 16
regarding surveys and
Chapter 17
regarding experiments), and how much information was collected (that is, you have to know the sample size).
Many charts and graphs that appear in the media don't include a sample size. You also find that many headlines are "not exactly" what they seem to be, when the details of an article reveal either a small sample size (reducing reliability in the results) or in some cases, no information at all about the sample size. (For example, you've probably seen the chewing gum ad that says, "Four out of five dentists surveyed recommend [this gum] for their patients who chew gum." What if they really did ask only five dentists?)
HEADS UP | Always look for the sample size before making decisions about statistical information. The smaller the sample size, the less reliable the information. If the sample size is missing from the article, get a copy of the full report of the study, contact the researcher, or contact the journalist who wrote the article. |