Statistics Essentials For Dummies (4 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies

10.19Mb size Format: txt, pdf, ePub

Analyzing Data

After the data have been collected and described using pictures and numbers, then comes the fun part: navigating through that black box called the
statistical analysis
. If the study has been designed properly, the original questions can be answered using the appropriate analysis, the operative word here being
appropriate
. Many types of analyses exist; choosing the wrong one will lead to wrong results.

In this book I cover the major types of statistical analyses encountered in introductory statistics. Scenarios involving a fixed number of independent trials where each trial results in either success or failure use the binomial distribution, described in Chapter 4. In the case where the data follow a bell-shaped curve, the normal distribution is used to model the data, covered in Chapter 5.

Chapter 7 deals with confidence intervals, used when you want to make estimates involving one or two population means or proportions using a sample of data. Chapter 8 focuses on testing someone's claim about one or two population means or proportions — these analyses are called hypothesis tests. If your data set is small and follows a bell-shape, the
t
-distribution might be in order; see Chapter 9.

Chapter 10 examines relationships between two numerical variables (such as height and weight) using correlation and simple linear regression. Chapter 11 studies relationships between two categorical variables (where the data place individuals into groups, such as gender and political affiliation). You can find a fuller treatment of these topics in
Statistics For Dummies
(Wiley), and analyses that are more complex than that are discussed in the book
Statistics II For Dummies
, also published by Wiley.

Making Conclusions

Researchers perform analysis with computers, using formulas. But neither a computer nor a formula knows whether it's being used properly, and they don't warn you when your results are incorrect. At the end of the day, computers and formulas can't tell you what the results mean. It's up to you.

One of the most common mistakes made in conclusions is to overstate the results, or to generalize the results to a larger group than was actually represented by the study. For example, a professor wants to know which Super Bowl commercials viewers liked best. She gathers 100 students from her class on Super Bowl Sunday and asks them to rate each commercial as it is shown. A top 5 list is formed, and she concludes that Super Bowl viewers liked those 5 commercials the best. But she really only knows which ones
her students
liked best — she didn't study any other groups, so she can't draw conclusions about all viewers.

Statistics is about much more than numbers. It's important to understand how to make appropriate conclusions from studying data, and that's something I discuss throughout the book.

Chapter 2
:
Descriptive Statistics

In This Chapter

Statistics to measure center

Standard deviation, variance, and other measures of spread

Measures of relative standing

Descriptive statistics
are numbers that summarize some characteristic about a set of data. They provide you with easy-to-understand information that helps answer questions. They also help researchers get a rough idea about what's happening in their experiments so later they can do more formal and targeted analyses. Descriptive statistics make a point clearly and concisely.

In this chapter you see the essentials of calculating and evaluating common descriptive statistics for measuring center and variability in a data set, as well as statistics to measure the relative standing of a particular value within a data set.

Types of Data

Data come in a wide range of formats. For example, a survey might ask questions about gender, race, or political affiliation, while other questions might be about age, income, or the distance you drive to work each day. Different types of questions result in different types of data to be collected and analyzed. The type of data you have determines the type of descriptive statistics that can be found and interpreted.

There are two main types of data: categorical (or qualitative) data and numerical (or quantitative data).
Categorical data
record qualities or characteristics about the individual, such as eye color, gender, political party, or opinion on some issue (using categories such as agree, disagree, or no opinion).
Numerical data
record measurements or counts regarding each individual, which may include weight, age, height, or time to take an exam; counts may include number of pets, or the number of red lights you hit on your way to work. The important difference between the two is that with categorical data, any numbers involved do not have real numerical meaning (for example, using 1 for male and 2 for female), while all numerical data represents actual numbers for which math operations make sense.

A third type of data,
ordinal data
, falls in between, where data appear in categories, but the categories have a meaningful order, such as ratings from 1 to 5, or class ranks of freshman through senior. Ordinal data can be analyzed like categorical data, and the basic numerical data techniques also apply when categories are represented by numbers that have meaning.

Counts and Percents

Categorical data place individuals into groups. For example, male/female, own your home/don't own, or Democrat/Republican/Independent/Other. Categorical data often come from survey data, but they can also be collected in experiments. For example, in a test of a new medical treatment, researchers may use three categories to assess the outcome: Did the patient get better, worse, or stay the same?

Categorical data are typically summarized by reporting either the number of individuals falling into each category, or the percentage of individuals falling into each category. For example, pollsters may report the percentage of Republicans, Democrats, Independents, and others who took part in a survey. To calculate the percentage of individuals in a certain category, find the number of individuals in that category, divide by the total number of people in the study, and then multiply by 100%. For example, if a survey of 2,000 teenagers included 1,200 females and 800 males, the resulting percentages would be (1,200 ÷ 2,000)
*
100% = 60% female and (800 ÷ 2,000)
*
100% = 40% male.

You can further break down categorical data by creating crosstabs.
Crosstabs
(also called
two-way tables
) are tables with rows and columns. They summarize the information from two categorical variables at once, such as gender and political party, so you can see (or easily calculate) the percentage of individuals in each combination of categories. For example, if you had data about the gender and political party of your respondents, you would be able to look at the percentage of Republican females, Democratic males, and so on. In this example, the total number of possible combinations in your table would be the total number of gender categories times the total number of party affiliation categories. The U.S. government calculates and summarizes loads of categorical data using crosstabs. (see Chapter 11 for more on two-way tables.)

If you're given the number of individuals in each category, you can always calculate your own percents. But if you're only given percentages without the total number in the group, you can never retrieve the original number of individuals in each group. For example, you might hear that 80% of people surveyed prefer Cheesy cheese crackers over Crummy cheese crackers. But how many were surveyed? It could be only 10 people, for all you know, because 8 out of 10 is 80%, just as 800 out of 1,000 is 80%. These two fractions (8 out of 10 and 800 out of 1,000) have different meanings for statisticians, because the first is based on very little data, and the second is based on a lot of data. (See Chapter 7 for more information on data accuracy and margin of error.)

Measures of Center

The most common way to summarize a numerical data set is to describe where the center is. One way of thinking about what the center of a data set means is to ask, "What's a typical value?" Or, "Where is the middle of the data?" The center of a data set can be measured in different ways, and the method chosen can greatly influence the conclusions people make about the data. In this section I present the two most common measures of center: the mean (or average) and the median.