Intelligence Testing and Cultural Diversity: Pitfalls and Promises

Donna Y. Ford1
Vanderbilt University
Nashville, TN

Background: Confusion and Controversy

There is a great deal of concern and debate about the low performance of racially and linguistically diverse students—African Americans, Hispanic Americans, and Native Americans—on standardized tests, as well as their under-representation in gifted education. Nowhere are the debates and controversies surrounding intelligence more prevalent than in gifted education and special education. These two educational fields rely extensively on tests to make educational and placement decisions. In gifted education, low test scores often prevent diverse students from being identified as gifted and receiving services; in special education, low test scores often result in identifications such as learning disabled, mentally retarded, and so forth. Racially and linguistically diverse students (African Americans, Hispanic Americans, and Native Americans) are under-represented in gifted education and over-represented in special education (see Council of State Directors of Programs for the Gifted and National Association for Gifted Children [NAGC], 2003; U.S. Department of Education, 2003).

There are two persistent, major debates or controversies surrounding minority students’ intelligence test performance. In one camp, scholars argue that the low test performance of minority students can be attributed to cultural deprivation or disadvantage(s); connotatively, this refers to the notion of diverse students being inferior to other students (see Rushton, 2003). Unfortunately, deficit thinking orientations are present even today (e.g., Ford, Harris, Tyson, & Frazier Trotman, 2002). For instance, Frasier, García, and Passow (1995), and Harmon (2002) argued that teachers tend not to refer racially and culturally diverse students to gifted programs because of their deficit thinking and stereotypes about diverse students. When the focus is on what diverse students cannot do rather than what they can do, then they are not likely to be referred for gifted education services.

In a different camp, scholars argue that minority students are culturally different, but not culturally disadvantaged or deficient (e.g., Boykin, 1986; Delpit, 1995; Erickson, 2004; Nieto, 1999; Rodriguez & Bellanca, 1996; Shade, Kelly, & Oberg, 1997). These individuals acknowledge that culture impacts test performance, but they do not equate or associate low performance with inferiority.

Beyond the ongoing debates about the source in intelligence, there are equally spirited and rigorous debates about the use of standardized tests with diverse groups, with the greatest attention to issues of test bias (Armour-Thomas, 1992; Helms, 1992). Publications on test bias seem to have waned in the last decade, although the Bell Curve (Herrnstein & Murray, 1994) generated renewed debates and controversy. Many test developers have gone to great length to decrease or eliminate (if this is possible) culturally biased (or culturally-loaded) test items (Johnsen, 2004). Accordingly, some scholars contend that test bias no longer exists (e.g., Fancher, 1995; Jensen 1998; 2000; Rushton, 2003). Others contend that tests can be culturally-reduced, that bias can be decreased; still others contend that tests can never be bias free or culturally neutral because they are developed by people, they reflect the culture of the test developer, and absolute fairness to every examinee is impossible to attain, for no other reasons than the fact that tests have imperfect reliability and that validity in any particular context is a matter of degree (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, hereafter referred to as “Joint Standards,” 1999).

In sum, there is little consensus in education (and psychology) about the reasons diverse students score lower on standardized tests of intelligence than do White students. Further, there is little consensus regarding the definition of intelligence, the definition of test bias, the existence of test bias, the types of test biases, the impact of test bias on diverse students, and the nature and extent of test bias in contemporary or newly re-normed tests.

With so many unanswered questions and controversies regarding intelligence, testing in general, and testing diverse students in particular, what can educators in gifted education do to ensure that these students have access to and are represented in gifted education programs and services?

Testing Issues and Diverse Populations

There is a longstanding and persistent debate regarding the equitable use of tests and assessment strategies with diverse populations. This debate and related concerns are especially prevalent in cases of high-stakes testing, where tests are used to make important and long-term educational decisions about students. As Lam (1993) observed, once test scores become numbers in students’ files, they provide the basis for high-stakes decisions concerning placement, selection, certification, and promotion that are made without consideration of the inequities surrounding testing in general and testing culturally diverse students in particular.

Psychological and psychoeducational assessment is an area that has been heavily subjected to complaints about the differential treatment of diverse groups. Korchin (1980), and others contend that standardized tests have contributed to the perpetuation of social, economic, and political barriers confronting diverse groups (Padilla & Medina, 1996; Suzuki, Meller, & Ponterotto, 1996). Specifically, questions have been raised regarding whether standardized intelligence tests are biased. Tests can be biased in terms of impact (e.g., how they are used) and statistically. Tests can be biased if they treat groups unfairly or discriminate against diverse groups by, for example, “underestimating their potential or over-pathologizing their symptoms” (Suzuki et al., 1996, p. xiii). This concept is referred to as disparate impact (Office for Civil Rights [OCR], 2000) and may not be associated with statistical biases, defined next. The Joint Standards (1999) defined statistical bias as a systematic error in a test score. In discussing test fairness, statistical bias may refer to construct under-representation or construct-irrelevant components of test scores that differentially affect the performance of different groups of test takers. Thus, it is important to note that when tests are used for selecting and screening, the potential for denying diverse groups access to educational opportunities, such as gifted education programs, due to bias is great.

The consequences of interpretation bias are grave. For instance, because many school districts rely on a single test score to place students in gifted education programs2, and given the lower performance of diverse groups on tests, this practice serves as an effective gate-keeping mechanism. Interpreting test performance—high or low—based on one test or measure must be avoided due to the limited data provided from a single score. NAGC (1997), OCR (2000), and Joint Standards (1999) have noted the serious limitations and negative consequences (e.g., disparate impact) of using one test score to identify students as gifted and to determine their need for placement in gifted education programs. In other words:

Tests are not perfect. Test questions are a sample of possible questions that could be asked in a given area. Moreover, a test score is not an exact measure of a student’s knowledge or skills. A student’s scores can be expected to vary across different versions of a test—within a margin of error determined by the reliability of the test, and as a function of the particular sample of questions asked and/or transitory factors, such as the student’s health on the day of the tests. Thus, no single test score can be considered a definitive measure of a student’s knowledge. (OCR, 2000, p. 14)

Our basic obligation as educators is to meet the needs of students as they come to us—with their different learning styles, economic backgrounds, cultural backgrounds, and academic skills. In Larry P. v. Riles (1979), the court argued:

If tests predict that a person is going to be a poor employee, the employer can legitimately deny the person the job, but if tests suggest that a young child is probably going to be a poor student, a school cannot on that basis alone deny that child the opportunity to improve and develop the academic skills necessary to succeed in our society.

Stated differently, gifted education must not only teach gifted students who demonstrate their gifts and talents, they must also address student potential and, thus, create talent development models (Callahan & McIntyre, 1994; USDE, 1993, 1998).

The Influence of Culture on Test Performance: African-American Students as a Case in Point

Culture can be defined as the collective beliefs, attitudes, traditions, customs, and behaviors that serve as a filter through which a group of people view and respond to the world (Erickson, 2004; Ford & Harris, 1999; Ford et al., 2002; Hall, 1976). Culture is a way of life, a way of looking at and interpreting life, and a way of responding to life. This definition becomes clearer when one thinks of “the terrible twos,” the teen or adolescent culture, the culture of poverty, and so forth. Members of these groups have in common beliefs, attitudes, traditions, customs, and behaviors (e.g., Storti, 1998).

In a thoughtful and compelling monograph entitled A New Window for Looking at Gifted Children, Frasier et al. (1995) state, “Manifestation of characteristics associated with giftedness may be different in minority children, yet educators are seldom trained in identifying those behaviors in ways other than the way they are observed in the majority culture” (p. 33). This statement was confirmed in a study that included teachers’ perceptions of giftedness among diverse students (Frasier et al. (1995). Likewise, Helms (1992) asks:

  1. Is there evidence that the culturally conditioned intellectual skills used by Blacks and Whites generally differ and that these differences have been equivalently incorporated into the measurement procedures?
  2. Do Blacks and Whites use the same test-taking strategies when ostensibly responding to the same material, and do these strategies have equivalent meaning?
  3. If different strategies are used by the racial groups, to what extent are these differences an aspect of test predictors and test criteria?
  4. How does one measure the cultural characteristics of intelligence tests? (p. 1097)

The implications of these questions for educators are that, when differences in performance on intelligence tests are attributed to racial or ethnic differences, educators must recognize this explanation for the non sequitur that it is. Instead of continuing to use such measures until something better comes along, educators must challenge the scientists on whose work their test usage is based to find culturally defined psychological explanations (e.g., culture-specific attitudes, feelings, and behaviors) for why such racial and ethnic differences exist (Helms, 1992, p. 1097).

Lam (1993) discussed five assumptions (or misassumptions) that summarize the many concerns that persist relative to intelligence testing and diverse groups:

  1. Test developers assume that test takers have no linguistic barriers (or differences) that inhibit their performance on tests.
  2. Test developers assume that the content of the test at any particular level is suitable and of nearly equal difficulty for test takers.
  3. Test developers assume that test takers are familiar with or have the test sophistication for taking standardized tests.
  4. Test developers assume that test takers are properly motivated to do well on the test.
  5. Test developers assume that test takers do not have strong negative psychological reactions to testing.
Promising Practices and Considerations

Intelligence tests are here to stay. However, educators are not bound by their exclusive use. Educators do not have to be “slaves” to tests; instead, they can work to ensure that tests, policies and procedures, as described below, are valid, reliable and fair. The first step is to develop culturally sensitive assumptions.

Culturally Sensitive Assumptions
The accuracy and appropriateness of the intellectual assessment process is based on a number of assumptions, a few of which were discussed earlier. Kaufman (1990, 1994) suggested alternative assumptions worthy of adoption because they offer promise in making testing more culturally sensitive:

  1. The focus on an assessment is the person being assessed, not the test (Kaufman, 1990). Professionals should not become preoccupied with the IQ scores to the detriment of the individual being assessed.
  2. The goal of any examiner is to be better than the tests he/she uses (Kaufman, 1990). It requires knowledge, skills, and cultural competence to make a complete and comprehensive assessment of diverse groups.
  3. Intelligence tests measure what the individual has learned (Kaufman, 1990). The content of all tasks, whether verbal or non-verbal, is learned within a culture (Miller, 1996). Therefore, all tests are culturally-loaded.
  4. The tasks composing intelligence tests are illustrative samples of behavior and are not meant to be exhaustive (Kaufman, 1994). Collateral information (e.g., learning styles, motivation, interests, health) must be collected to develop a profile of an individual’s strengths and weaknesses and to, then, develop educational interventions and opportunities.
  5. Intelligence tests measure mental functioning under fixed experimental conditions (Kaufman, 1990). As such, how individuals will demonstrate their intelligence in other settings cannot be accurately predicted without gathering extensive information—test information and non-test information—on individuals in other settings.
  6. IQ tests must be interpreted on an individual basis by a “shrewd and flexible detective” (Kaufman, 1990, p. 27). Professionals must investigate all information collected on students in order to provide a comprehensive picture of the individual in his/her cultural context.
  7. Intelligence tests are best used to generate hypotheses of potential help to the person; they are misused when the results lead to harmful outcomes (Kaufman, 1990). Too often, data obtained from intelligence tests have been used to indicate the inferiority of culturally diverse groups (see lengthy discussions on this topic by Gould, 1995 and Fancher, 1995). Professionals need to move beyond deficit thinking when assessing diverse populations (Ford et al., 2002; Samuda, 1998).
  8. Validity and reliability are not only established by test developers, they are also established by test users and interpreters. Sandoval, Frisby, Geisinger, Scheuneman, and Grenier (1998) offered the following recommendations relative to promoting equitable assessments with diverse groups; these recommendations focus primarily on ways to improve interpretations of diverse students’ scores.
    1. Identify preconceptions—professionals must identify their conceptions and viewpoints—negative and positive—about diverse groups, and recognize that these perceptions influence their assessment of diverse groups.
    2. Develop complex schemes or conceptions of groups—A major problem with interpreting the test scores of diverse groups is that results are examined with little regard to the many factors that affect the lives and performance of these groups.
    3. Actively search for disconfirmatory evidence—When using and interpreting test scores, especially low test scores, of diverse groups, professionals must constantly search for alternative explanations. For example, central questions are: “Did the individual have the opportunity to learn the information or to express it on the test?” “How does the individual’s culture affect his/her test performance?”
    4. Resist a rush to judgment—Professionals must be reflective, thoughtful, inquisitive in their practice of interpreting and using test scores with diverse groups. In order to avoid rushing to judgment, Kaufman (1994) recommended that professionals spend time interacting in the neighborhoods that are serviced by their schools as a firsthand means of learning local cultural values, traditions, and customs.
Summary—Guiding Principles for Equitable and Culturally Responsive Assessment

Regardless of whether one is using traditional intelligence tests or tests considered to be less culturally-loaded, testing, assessment, test interpretation, and test use must be guided by sound, defensible, and equitable principles and practices. The following guiding principles are offered for consideration:

  1. Every school system must be committed to equity in finding potentially gifted students; this goal is non-negotiable (Frasier et al., 1995).
  2. In addition to examining test bias, we must examine test fairness (Gregory, 2004). We must not become complacent in the belief that finding a test to be unbiased means that the test is fair—an unbiased test can still be unfair (Gregory, 2004). Test bias and test fairness should be explored.
  3. The effects of threats to a test’s validity and reliability must be examined and considered when interpreting and using test scores (Joint Standards, 1999).
  4. A given pattern of test performances represents a cross-sectional view of the individual being assessed within a particular context (i.e., ethnic, cultural, familial, social) (Joint Standards, 1999).
  5. There is no test score that can tell, ex post facto, the native potential that a student may have had at birth (Samuda, 1998); Do not overvalue IQs or treat them as a magical manifestation of a child’s inborn potential (Kaufman, 1994); do not over-interpret test scores by assigning them undue power.
  6. Test scores should not be allowed to override other sources of evidence about test takers (Joint Standards, 1999).
  7. In educational settings, a decision or characterization that will have major impact on a student should not be made on the basis of a single test score (NAGC, 1997). Other relevant information should be taken into account if it will enhance the overall validity of the decision (Joint Standards, 1999).
  8. Comprehensive assessment, the gathering of a wide range of information about test takers, helps to place test scores into a socio-cultural context by considering how an examinee’s performance is influenced by acculturation, language proficiency, socioeconomic background, and ethnic/racial identity (Samuda, Feuerstein, Kaufman, Lewis, & Sternberg, 1998) . . . comprehensive assessment is a continuous process and the assessor must learn as much as possible about the test taker’s culture . . . and level of acculturation.
  9. It is the responsibility of those who mandate the use of tests to identify and monitor their impact and to minimize potential negative consequences. Consequences resulting from the uses of the test, both intended and unintended, should also be examined by the test user (Joint Standards, 1999).
  10. In cases where a language-oriented test is inappropriate due to the test takers’ limited proficiency in that language, a non-verbal test may be a suitable alternative (Joint Standards, 1999). Both verbal and non-verbal tests can provide balanced and important information about diverse students (Samuda et al., 1998).
  11. When interpreting test scores, the examiner or tester must take into account that many traditional tests have not been normed adequately with various cultural groups (Samuda et al., 1998); test users must be constantly aware of the limitations of standardized tests (Kaufman, 1994).
  12. The ultimate responsibility for appropriate test use and interpretation lies predominantly with test users (Joint Standards, 1999); they must gain experience in working with culturally diverse groups in order to improve their ability to interpret and effectively use test scores (Kaufman, 1994).
  13. Tests selected should be suitable for the characteristics and background of the test taker (Joint Standards, 1999). Test scores must not be interpreted and used in a color-blind or culture-blind fashion (Ford, 1996).
  14. Every effort must be made to eliminate prejudice, racism and inequities and to provide accurate and meaningful scores linked to appropriate intervention strategies (Samuda et al., 1998). Essentially, test scores should be used to help students, not to hurt them.

Selecting, interpreting and using tests are complicated endeavors. When one adds student differences, including cultural diversity, to the situation, the complexity increases. A discussion on the nature-nurture debate was discussed briefly. Little attention was given to this controversy because the discussion is convoluted—for every publication that convincingly argues for the heredity position, an equally compelling publication argues for the environmental position. Likewise, for every publication that argues persuasively against the existence of test bias, a counterargument convincingly contends that tests continue to be biased against diverse groups.

There is no debate, however, that culturally and linguistically diverse students are consistently under-represented in gifted programs. Under-representation exists primarily because of diverse students’ performance on traditional intelligence tests. These tests have served as gatekeepers for diverse students. Suggestions for ensuring equitable, culturally responsive assessment practices were provided, along with attention to alternative tests—non-verbal ability tests. Professionals must be vigilant about finding and solving factors that hinder the test performance of diverse students. Tests are tools. The ultimate responsibility for equitable assessment rests with those who develop, administer, interpret, and use tests. Tests in and of themselves are harmless; they become harmful when misunderstood and misused. Historically, diverse students have been harmed educationally by test misuse. The pedagogical clock is ticking. What better time than today to be more responsible in eliminating barriers to the representation of diverse students in gifted education. A mind is a terrible thing to waste; a mind is a terrible thing to erase (Ford & Harris, 1999).

1. This article is based on the monograph by Ford (2004) entitled Intelligence Testing and Cultural Diversity: Concerns, Cautions and Considerations, The National Research Center on the Gifted and Talented, University of Connecticut, Storrs, CT.

2. According to the most recent report by the Council of State Directors of Programs for the Gifted and the National Association for Gifted Children (2003), in 2001-2002, only 24 states mandate non-discriminatory testing in their gifted education policies and procedures, while 18 report no such mandate (pp. 53-54). Further, several states report using one score to make placement decisions (e.g., Arizona, Oregon, Ohio).


