The Bell Curve: Intelligence and Class Structure in American Life (117 page)

Read The Bell Curve: Intelligence and Class Structure in American Life Online

Authors: Richard J. Herrnstein,Charles A. Murray

Tags: #History, #Science, #General, #Psychology, #Sociology, #Genetics & Genomics, #Life Sciences, #Social Science, #Educational Psychology, #Intelligence Levels - United States, #Nature and Nurture, #United States, #Education, #Political Science, #Intelligence Levels - Social Aspects - United States, #Intellect, #Intelligence Levels

BOOK: The Bell Curve: Intelligence and Class Structure in American Life
13.59Mb size Format: txt, pdf, ePub

10
The doctrine has been built into the U.S. Employment and Training Service’s General Aptitude Test Battery (GATB), into the federal civil service’s Professional and Administrative Career Examination (PACE), and into the military’s Armed Services Vocational Aptitude Battery (ASVAB). Bartholet 1982; Braun 1992; Gifford 1989; Kelman 1991; Seymour 1988. For a survey of test instruments and their use, see Friedman and Williams 1982.

11
For a recent review of the expert community as a whole, see Schmidt and Ones 1992.

12
Hartigan and Wigdor 1989 and Schmidt and Hunter 1991 represent the two ends of the range of expert opinion.

13
For a sampling of the new methods, see Bangert-Drowns 1986; Glass 1976; Glass, McGaw, and Smith 1981; Hunter and Schmidt 1990. Meta-analytic strategies had been tried for decades prior to the 1970s, but it was after the advent of powerful computers and statistical software that many of the techniques became practicable.

14
Hartigan and Wigdor 1989; Hunter and Schmidt 1990; Schmidt and Hunter 1981.

15
We have used the terms
job productivity
or
job performance
or
performance ratings
without explaining what they mean or how they are measured. On the other hand, all of us have a sense of what job productivity is like—we are confident that we know who are the better and worse secretaries, managers, and colleagues among those with whom we work closely. But how is this knowledge to be captured in objective measures? Ratings by supervisors or peers? Samples of work in the various tasks that a job demands? Tests
of job knowledge? Job tenure or promotion? Direct cost accounting of workers’ output? There is no way to answer such a question decisively, for people may legitimately disagree about what it is about a worker’s performance that is most worth predicting. As a practical matter, ratings by supervisors, being the most readily obtained and the least intrusive in the workplace, have dominated the literature (Hunter 1986). But it is natural to wonder whether supervisor ratings, besides being easy to get, truly measure how well workers perform rather than, say, how they get along with the boss or how they look (Guion 1983).

To get a better fix on what the various measures of performance mean, it is useful to evaluate a number of studies that have included measures of cognitive ability, supervisor ratings, samples of work, and tests of job knowledge. Work samples are usually obtained by setting up stations for workers to do the various tasks required by their jobs and having their work evaluated in some reasonably objective way. Different occupations lend themselves more or less plausibly to this kind of simulated performance. The same is true of written or oral tests of job knowledge.

One of the field’s leaders, John Hunter, has examined the correlational structure that relates these different ways of looking at job performance to each other and to an intelligence test score (Hunter 1983, 1986). In a study of 1,800 workers, Hunter found a strong direct link between intelligence and job knowledge and a much smaller direct one between intelligence and performance in work sample tasks. By direct we mean that the variables predict each other without taking any other variable into account. The small direct link between intelligence and work sample was augmented by a large indirect link, via job knowledge: a person’s intelligence predicted his knowledge of the job, and his knowledge in turn predicted his work sample. The correlation (after the usual statistical corrections) between intelligence and job knowledge was .8; between intelligence and work sample it was .75. The indirect link between intelligence and work sample, via job knowledge, was larger by half than the direct one (Hunter 1986).

The correlation between intelligence and supervisor ratings in Hunter’s analysis was .47. Upon analysis, Hunter found that the primary reason is that brighter workers know more about their jobs, and supervisors respond favorably to their knowledge. A comparable analysis of approximately 1,500 military personnel in four specialties produced the same basic finding (Hunter 1986). This may seem a weakness of the supervisor rating measure, but is it really? How much workers know about their jobs correlates, on the one hand, with their intelligence and, on the other, with both how they do on direct tests of their work and how they are rated by their supervisors.
A worker’s intelligence influences how much he learns about the job, and job knowledge contributes to proficiency. The knowledge also influences the impression the worker makes on a supervisor rating more than the work as measured by a work sample test (which, of course, the supervisor may never see in the ordinary course of business). Using supervisor rating as a measure of proficiency is thereby justified, without having to claim that the rating directly measures proficiency.

Hunter found that work samples are more dependent on intelligence and job knowledge than are supervisor ratings. Supervisor ratings, which are so predominant in this literature, may, in other words, underestimate how important intelligence is for proficiency. Recent research suggests that supervisor ratings in fact do underestimate the correlation between intelligence and productivity (Becker and Huselid 1992). But we should acknowledge again that none of the measures of proficiency—work samples, supervisor ratings, or job knowledge tests—is free of the taint of artificiality, let alone arbitrariness. Supervisor ratings may be biased in many ways; a test of job knowledge is a test, not a job; and even a worker going from one work station to another under the watchful eye of an industrial psychologist may be revealing something other than everyday competence. It has been suggested that the various contrived measures of workers tell us more about maximum performance than they do about typical, day-to-day proficiency (Guion 1983). We therefore advise that the quantitative estimates we present here (or that can be found in the technical literature at large) be considered only tentative and suggestive.

16
The average validity of .4 is obtained after standard statistical corrections of various sorts. The two most important of these are a correction for test unreliability or measurement error and a correction for restriction of range among the workers in any occupation. All of the validities in this section of the chapter are similarly corrected, unless otherwise noted.

17
Ghiselli 1966, 1973; Hunter and Hunter 1984, Table 1.

18
Hunter 1980; Hunter and Hunter 1984.

19
Where available, ratings by peers, tests of job knowledge, and actual work samples often come close to ability measures as predictors of job performance (Hunter and Hunter 1984). But aptitude tests have the practical advantage that they can be administered relatively inexpensively to large numbers of applicants, and they do not depend on applicants’ having been on the job for any length of time.

20
E. F. Wonderlic & Associates 1983; Hunter 1989. These validities, which are even higher than the ones presented in the table on page 74 are for training success rather than for measures of job performance and are more directly comparable with the column for training success in the GATB
studies than the column for job proficiency. Regarding job performance, one major study evaluated the performance of about 1,500 air force enlisted men and women working in eight military specialties, chosen to be representative of military specialties in the air force. Performance was variously measured: by defining a set of tasks involved in each job, then training a group of evaluators to assess those specific tasks; by interviews of the personnel on technical aspects of their jobs; by supervisor ratings after training the supervisors; and combinations of methods. The average correlation between AFQT score and a hands-on job performance measure was .40, with the highest among the precision measurement equipment specialists and the avionics communications specialists and the lowest among the air traffic control operators and the air crew life support specialists. Insofar as the jobs were restricted to those held by enlisted men, the distribution of jobs was somewhat skewed toward the lower end of the skill range. We do not have an available estimate of the validity of the AFQT over all military jobs.

21
Hartigan and Wigdor 1989.

22
It is one of the chronically frustrating experiences when reading scientific results: Two sets of experts, supposedly using comparable data, come out with markedly different conclusions, and the reasons for the differences are buried in technical and opaque language. How is it possible for a layperson to decide who is right? The different estimates of mean validity of the GATB—.45 according to Hunter, Schmidt, and some others; .25 according to the Hartigan committee—is an instructive case in point.

Sometimes the differences really are technical and opaque. For example, the Hartigan committee based its estimate on the assumption that the reliability of supervisor ratings was higher than other studies assumed—.8 instead of .6 (Hartigan and Wigdor 1989, p. 170). By assuming a higher reliability, the committee’s correction for measurement error was smaller than Hunter’s. Deciding between the Hartigan committee’s use of .8 as the reliability of supervisor ratings instead of the .6 used by Hunter is impossible for anyone who is not intimately familiar with a large and scattered literature on that topic, and even then the choice remains a matter of judgment. But the Hartigan committee’s decision not to correct for restriction of range, which makes the largest difference in their estimates of the overall validity, is based on a much different kind of disagreement. Here, a layperson is as qualified to decide as an expert, for this is a disagreement about what question is being answered.

John Hunter and others assumed that for any job the applicant pool is the entire U.S. work force. That is, they sought an answer to the question, “What is the relationship between job performance and intelligence for the work force at large?” The Hartigan committee objected to their assumption
on grounds that, in practice, the applicant pool for any particular job is not the entire U.S. work force but people who have a chance to get the job. As they accurately noted, “People gravitate to jobs for which they are potentially suited” (Hartigan and Wigdor 1989, p. 166).

But embedded in the committee’s objection to Hunter’s estimates is a tacit switch in the question that the analysis is supposed to answer. The Hartigan committee sought an answer to the question, “Among those people who apply for such-and-such a position, what is the relationship between intelligence and job performance?” If one’s objective is not to discourage people who weigh only 250 pounds from applying for jobs as tackles in the NFL, to return to our analogy, then the Hartigan committee’s question is the appropriate one. Of course, by minimizing the validity of weight, a large number of 150-pound lineman may apply for the jobs. Thus our reasons for concluding that the assumption used by Hunter and Schmidt (among others), that restriction of range calculations should be based on the entire work force, is self-evidently the appropriate choice if one wants to know the overall relationship of IQ to job performance and its economic consequences.

23
The ASVAB comprises ten subtests: General Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Numerical Operations, Coding Speed, Auto/Shop Information, Mathematics Knowledge, Mechanical Comprehension, and Electronics Information. Only Numerical Operations and Coding Speed are highly speeded; the other eight are nonspeeded “power” tests. All the armed services use the four MAGE composites, for Mechanical, Administrative, General, and Electronics specialties, each of which includes three or four subtests in a particular weighting. These composites are supposed to predict a recruit’s trainability for the particular specialty. The AFQT is yet another composite from the ASVAB, selected so as to measure
g
efficiently. See Appendix 3.

24
About 80 percent of the sample had graduated from high school and had no further civilian schooling, fewer than 1 percent had failed to graduate from high school, and fewer than 2 percent had graduated from college; the remainder had some post-high school civilian schooling short of a college degree. The modal person in the sample was a white male between 19 and 20 years old, but the sample also included thousands of women and people from all American ethnic groups; their ages ranged from a minimum of 17 to almost 15 percent above 23 years (see Ree and Earles 199Ob). Other studies, using educationally heterogeneous samples, have in fact shown that, holding AFQT constant, high school graduates are more likely to avoid disciplinary action, to be recommended for reenlistment, and to be promoted to higher rank than nongraduates (Office of the Assistant Secretary of Defense 1980). Current enlistment policies reflect the independent
predictiveness of education, in that of two applicants with equal AFQT score, the high school graduate is selected over the nongraduate if only one is to be accepted.

25
In fact, there may be some upward bias in these correlations, inasmuch as they were not cross validated to exclude capitalization on chance.

26
What does it mean to “account for the observed variation”? Think of it in this way: A group of recruits finishes its training course; their grades vary. How much less would they have varied had they entered the course with the same level of
g?
This may seem like a hypothetical question, but it is answered simply by squaring the correlation between the recruits’ level of
g
and their final grades. In general, given any two variables, the degree to which variation in either is explained (or accounted for, in statistical lingo) by the other variable is obtained by squaring the correlation between them. For example, a perfect correlation of 1 between two variables means that each of the variables fully explains the observed variations in the other. When two variables are perfectly correlated, they are also perfectly redundant since if we know the value of one of them, we also know the value of the other without having to measure it. Hence, 1 squared is 1.0 or 100 percent. A correlation of .5 means that each variable explains, or accounts for, 25 percent of the observed variation in the other; a correlation of 0 means that neither variable accounts for any of the observed variation in the other.

Other books

Midnight for Morgana by Martin, Shirley
The Weight-loss Diaries by Rubin, Courtney
Rogue of the Borders by Cynthia Breeding
Just Friends by Robyn Sisman
Bought By Him #1 by Taylor, Alycia
Whispering Spirits by Rita Karnopp
All the Lasting Things by David Hopson
Ransomed MC Princess #1 by Cove, Vivian
Bred in the Bone by Christopher Brookmyre