Skip to main content
Erschienen in: Journal of Pediatric Neuropsychology 1/2023

Open Access 01.03.2023

Assessment of Cultural Bias on the PdPVTS Across Gender and Racial/Ethnic Groups

verfasst von: Robert J. McCaffrey, Cecil R. Reynolds, Julie K. Lynch, Robert A. Leark, Robert Ramkhalawansingh

Erschienen in: Journal of Pediatric Neuropsychology | Ausgabe 1/2023

Abstract

Performance validity assessment has become a standard component of psychoeducational and neuropsychological evaluations (Sweet et al., 2021) but there is evidence to demonstrate that cultural variables can influence performance validity test outcomes. Given the critical role that performance validity assessment plays in determining the accuracy of neuropsychological and psychoeducational test data, it is imperative that these tests are unbiased across culturally diverse samples. The purpose of the current investigation was to examine the impact of race/ethnicity and gender on the Pediatric Performance Validity Test Suite (PdPVTS; McCaffrey, Lynch, Leark, & Reynolds, 2020). A general population sample of n = 838 examinees was collected and demographically matched subsamples were established to compare gender and racial/ethnic groups while controlling for other confounding demographic variables. Classification/failure rates revealed little evidence of adverse impact across gender and racial/ethnic groups. Mean score equivalency between demographic groups was achieved in most cases. Instances where evidence of equivalence could not be established, namely for the Story Questions, were associated with smaller sample sizes and lower statistical power. Taken together, the current study demonstrates that examiners may use the PdPVTS with confidence that pass/fail classifications will not be associated with gender or race/ethnicity for Black, White, and Hispanic examinees.
Hinweise
The original online version of this article was revised for retrospective open access.
Guest Action Editor Dr. Arthur McNeill Horton oversaw the review and acceptance of this article.
A correction to this article is available online at https://​doi.​org/​10.​1007/​s40817-022-00139-9.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Test fairness is a fundamental principle of test development and refers to the extent to which a test measures the construct of interest and not characteristics of a specific group. Test fairness is linked to test bias, which is defined as empirically established systematic error in a test score. When the systematic error is attributable to a cultural/ethnic variable, the test is considered culturally biased (Newman et al., 2007; Reynolds et al., 2021).
The potential cultural bias of standard neuropsychological and psychoeducational tests has been an area of discussion and research for decades. As performance validity assessment has become a standard component of psychoeducational and neuropsychological evaluations (Sweet et al., 2021), the study of the possible influence of cultural variables on performance validity tests has been developing. This research, to date, points the relevance of several variables associated with culture for performance validity tests, including proficiency in English, bilingualism vs. monolingualism, educational attainment, and country-of-origin. As examples, non-English-speaking individuals with low educational attainment may have a higher failure rate using standard cut scores for some freestanding performance validity tests, specifically, the Test of Memory Malingering, the Rey-15 Item Memory Test, and b-test. Bilingual individuals, with research primarily examining Spanish–English speakers, may have higher failure rates on the Rey-15 Item Test, Warrington Recognition Memory Test, Rey Word Recognition Test, and Dot Counting Test. The Genuine Memory Impairment Profile from the Word Memory Test has been supported in research involving ethnically and linguistically diverse individuals but not the use of the primary validity indices in isolation. Embedded performance validity tests, the Reliable Digit Span and Digit Span Age-corrected Scaled score, have been found to have higher failure rates using standard cut score with Hispanic and Native American individuals, bilingual individuals (Spanish/English), and individuals with low educational attainment (Salazar et al., 2021; Strutt & Stinson, 2022 for review).
Limited research on the impact of race on performance validity tests has been conducted, with the general findings indicating no difference in the performance of racially diverse samples on many performance validity tests. Hood et al., (2022) found that clinically referred Black and White American adults performed generally comparably across a number of freestanding and embedded performance validity tests. Importantly, the two groups were equated for age and educational level. In a pediatric sample, Bosworth & Dodd (2020), Black, Hispanic, and Asian youth with history of mild traumatic brain injury performed comparably on the Non-Verbal Medical Symptom Validity Test (NV-MSVT). There also was no impact gender on the NV-MSVT.
Performance validity assessment is necessary in determining the accuracy of the neuropsychological and psychoeducational test data and should be a component of every evaluation, including pediatric evaluations (Emhoff et al., 2018; Sweet et al., 2021). It is essential that these are culturally fair tests, and this necessitates validation within culturally diverse samples. The current study examined the impact of race and gender on the Pediatric Performance Validity Test Suite (PdPVTS; McCaffrey et al., 2020). The PdPVTS is comprised of five computer-based tests that were specifically designed to assess performance validity in children and adolescents. The principles of universal design were kept forefront in developing a test battery that is accessible youth of varying physical and cognitive capabilities and utilizes cultural- and gender-neutral test stimuli. PdPVTS performance was examined across racial groups and gender. Considering the attention to cultural variables in developing test stimuli, we expected that racial and gender groups would perform comparably across the PdPVTS.

Methods

Participants

A general population sample of n = 838 examinees was collected. Sampling was stratified across age, gender, race/ethnic group, geographic region, and parental education level (PEL) in order to match the demographic composition of the US population. Data collection took place in the fall of 2017 and thus the demographic composition of the sample was matched to the 2017 US Census data (2017 American Community Survey; United States Census Bureau, 2017). During the development of the PdPVTS, the objective was to start with eight tests and to eliminate tests that did not perform as well to arrive at a final set of five tests that maximize sensitivity and specificity in discerning performance validity. Due to this experimental design, not all examinees received each of the final tests in the PdPVTS and, therefore, the total number of examinees in the general population sample who completed a given PdPVTS test ranged from n = 431 to n = 563.
Examinees were excluded from the sample if they had any uncorrected hearing, visual, or motor impairments that might affect their ability to use a tablet or other touchscreen devices to complete the PdPVTS. Examinees were also excluded if they could not communicate verbally. As this was a general population sample, examinees were excluded if they were diagnosed with any psychological, neurological, behavioral, or learning-related disorders. Examinees were excluded if the examiners observed any anomalies with respect to their interaction with the tablet or any of the tests in the PdPVTS (n = 2).
Given that the primary objective of the current investigation was to compare performance on the PdPVTS between gender and among racial/ethnic groups, demographically matched samples were selected to facilitate each set of comparisons while controlling for confounding demographic variables. To compare male and female respondents, subsamples were created for each test in the PdPVTS that were matched on age group, race/ethnicity, and parental education level. To compare Black respondents to White respondents, and to compare Hispanic respondents to White respondents, subsamples were created for each test that were matched on age group and gender only, due to sample size restrictions. Where possible, the same matched samples were used across multiple tests in the PdPVTS. The demographic composition of the male vs. female matched samples are summarized in Table 1 and the demographic characteristics of the Black vs. White and Hispanic vs. White matched subsamples are summarized in Tables 2 and 3, respectively.
Table 1
Demographics characteristics of the male vs. female matched samples
 
PdPVTS test(s)
Find the Animal, Silhouettes
Matching, Shape Learning
Story Questions 7–11
Story Questions 12 to 18
Male
Female
Male
Female
Male
Female
Male
Female
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
Parental education level
No high school diploma
18
8.6
18
8.6
21
10.3
21
10.3
15
10.9
15
10.9
6
9.2
6
9.2
High school graduate
68
32.5
68
32.5
67
33.0
67
33.0
47
34.1
47
34.1
20
30.8
20
30.8
Some college/associate’s
63
30.1
63
30.1
63
31.0
63
31.0
42
30.4
42
30.4
21
32.3
21
32.3
Bachelor’s degree
47
22.5
47
22.5
37
18.2
37
18.2
23
16.7
23
16.7
14
21.5
14
21.5
Graduate/professional degree
13
6.2
13
6.2
15
7.4
15
7.4
11
8.0
11
8.0
4
6.2
4
6.2
Race/ethnicity
Black
19
9.1
19
9.1
21
10.3
21
10.3
15
10.9
15
10.9
6
9.2
6
9.2
Hispanic
47
22.5
47
22.5
47
23.2
47
23.2
31
22.5
31
22.5
16
24.6
16
24.6
White
130
62.2
130
62.2
122
60.1
122
60.1
83
60.1
83
60.1
39
60.0
39
60.0
Other
13
6.2
13
6.2
13
6.4
13
6.4
9
6.5
9
6.5
4
6.2
4
6.2
Region
Midwest
45
21.5
48
23.0
43
21.2
41
20.2
30
21.7
27
19.6
13
20.0
14
21.5
 
Northeast
35
16.7
38
18.2
35
17.2
32
15.8
22
15.9
21
15.2
13
20.0
11
16.9
South
75
35.9
75
35.9
80
39.4
78
38.4
57
41.3
55
39.9
23
35.4
23
35.4
West
54
25.8
48
23.0
45
22.2
52
25.6
29
21.0
35
25.4
16
24.6
17
26.2
Age
Min = 5,
Max = 18
Min = 5,
Max = 18
Min = 5,
Max = 18
Min = 5,
Max= 18
Min = 5, 
Max = 11
Min = 5, 
Max = 11
Min = 12, 
Max = 18
Min = 12, 
Max = 18
M = 9.36, 
SD = 4.18
M = 9.27, 
SD = 4.09
M = 9.38,
SD = 4.05
M = 9.37,
SD = 4.06
M = 6.91, 
SD = 1.87
M = 6.94, 
SD = 1.91
M = 14.63, 
SD = 1.81
M = 14.54, 
SD = 2.12
Total n
209
209
203
203
138
138
65
65
The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests
Table 2
Demographic characteristics of the Black vs. White matched samples
 
PdPVTS test(s)
Find the Animal, Silhouettes
Matching, Shape Learning
Story Question 7–11
Story Question 12 to 18
Black
White
Black
White
Black
White
Black
White
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
Gender
Male
39
50.6
39
50.6
35
46.1
35
46.1
24
46.2
24
46.2
11
45.8
11
45.8
Female
38
49.4
38
49.4
41
53.9
41
53.9
28
53.8
28
53.8
13
54.2
13
54.2
Parental education level
No high school diploma
8
10.4
1
1.3
8
10.5
1
1.3
5
9.6
1
1.9
3
12.5
0
0.0
High school graduate
24
31.2
27
35.1
27
35.5
27
35.5
19
36.5
19
36.5
8
33.3
8
33.3
Some college/associate’s
29
37.7
24
31.2
24
31.6
20
26.3
17
32.7
13
25.0
7
29.2
7
29.2
Bachelor’s degree
13
16.9
24
31.2
10
13.2
22
28.9
6
11.5
14
26.9
4
16.7
8
33.3
Graduate/professional degree
3
3.9
1
1.3
7
9.2
6
7.9
5
9.6
5
9.6
2
8.3
1
4.2
Region
Midwest
16
20.8
26
33.8
15
19.7
16
21.1
11
21.2
11
21.2
4
16.7
5
20.8
 
Northeast
10
13.0
17
22.1
10
13.2
11
14.5
6
11.5
8
15.4
4
16.7
3
12.5
South
44
57.1
23
29.9
44
57.9
34
44.7
30
57.7
22
42.3
14
58.3
12
50.0
West
7
9.1
11
14.3
7
9.2
15
19.7
5
9.6
11
21.2
2
8.3
4
16.7
Age
Min = 5, 
Max = 18
Min = 5, 
Max = 18
Min = 5, 
Max = 18
Min = 5, 
Max = 18
Min = 5, 
Max = 11
Min = 5, 
Max = 11
Min = 12, 
Max = 18
Min = 12, 
Max = 18
M = 9.09, 
SD = 3.97
M = 9.09, 
SD = 4.34
M = 9.33, 
SD = 4.05
M = 9.21, 
SD = 3.98
M = 6.94, 
SD = 1.89
M = 6.85, 
SD = 1.88
M = 14.5, 
SD = 2.19
M = 14.33, 
SD = 1.95
Total n
77
77
76
76
52
52
24
24
The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests
Table 3
Demographic characteristics of the Hispanic vs. White matched samples
 
PdPVTS test(s)
Find the Animal, Silhouettes
Matching, Shape Learning
Story Question 7–11
Story Question 12 to 18
Hispanic
White
Hispanic
White
Hispanic
White
Hispanic
White
N
%
N
%
N
%
N
%
N
%
N
%
N
%
N
%
Gender
Male
69
50.4
69
50.4
66
49.6
66
49.6
47
52.8
47
52.8
19
43.2
19
43.2
Female
68
49.6
68
49.6
67
50.4
67
50.4
42
47.2
42
47.2
25
56.8
25
56.8
Parental education level
No high school diploma
29
21.2
5
3.6
33
24.8
6
4.5
20
22.5
5
5.6
13
29.5
1
2.3
High school graduate
47
34.3
49
35.8
41
30.8
42
31.6
28
31.5
29
32.6
13
29.5
13
29.5
Some college/associate’s
36
26.3
37
27.0
36
27.1
40
30.1
24
27.0
26
29.2
12
27.3
14
31.8
Bachelor’s degree
20
14.6
41
29.9
17
12.8
37
27.8
11
12.4
23
25.8
6
13.6
14
31.8
Graduate/professional degree
5
3.6
5
3.6
6
4.5
8
6.0
6
6.7
6
6.7
0
0.0
2
4.5
Region
Midwest
13
9.5
41
29.9
14
10.5
31
23.3
9
10.1
20
22.5
5
11.4
11
25.0
 
Northeast
16
11.7
27
19.7
16
12.0
25
18.8
11
12.4
19
21.3
5
11.4
6
13.6
South
49
35.8
49
35.8
50
37.6
50
37.6
34
38.2
32
36.0
16
36.4
18
40.9
West
59
43.1
20
14.6
53
39.8
27
20.3
35
39.3
18
20.2
18
40.9
9
20.5
Age
Min = 5,
Max = 18
Min = 5,
Max = 18
Min = 5,
Max = 18
Min = 5,
Max = 18
Min = 5,
Max = 11
Min = 5,
Max = 11
Min = 12,
Max = 18
Min = 12,
Max = 18
M = 9.58,
SD = 4.17
M = 9.49,
SD = 4.20
M = 9.61,
SD = 4.05
M = 9.5,
SD = 4.02
M = 7.17,
SD = 2.05
M = 7.03,
SD = 1.96
M = 14.55,
SD = 2.14
M = 14.48,
SD = 1.95
Total n
137
137
133
133
89
89
44
24
The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests

Materials and Procedure

The PdPVTS was administered on an iOS or Windows tablet device, or on a Windows desktop or laptop computer. Examinees were seated beside the examiner and the examiner guided each examinee through instructions for the practice screens and for each of the formal tests. The PdPVTS comprises four visual and one verbal tests: Find the Animal (visual scanning and classification), Matching (visual recognition), Shape Learning (visual recognition), Silhouettes (visual organization), and Story Questions (verbal recognition). Each test took from 3 to 5 min to administer for a total administration time of no more than 25 min.

Data Analyses

The aim of the present study was to examine the fairness of the PdPVTS by considering whether there were differences in PdPVTS outcomes (pass vs. fail) and PdPVTS scores (mean total score) across gender and racial/ethnic groups. Two sets of analyses were used to examine differences in PdPVTS outcomes and scores between groups: an adverse impact approach was used to consider whether there were differences in the rates of PdPVTS outcomes between groups and an equivalence testing approach was used to explore differences in PdPVTS total scores and to establish evidence related to any meaningful differences between groups.
Adverse Impact
An adverse impact analysis (for an overview, see Biddle, 2017) was used to consider the rates at which male vs. female, Black vs. White, and Hispanic vs. White examinees passed the PdPVTS. Adverse impact analysis is typically used in contexts where examinees are being selected for specific opportunities (e.g., employment, housing, resource allocation) and there is a desire to understand whether a legally protected or minority group is being selected at a meaningfully lower rate than a non-protected group. The test or criteria being used to facilitate selection is said to have an adverse impact when it leads to a protected group being selected at a rate of 80% or lower than that of the non-protected group (i.e., the 4/5 rule; see Hough et al., 2001 for an overview). In the current study, we considered the ratio of examinees passing the PdPVTS to determine whether there were any meaningful differences in the rates at which different gender and racial/ethnic groups passed the PdPVTS and whether the PdPVTS could have an adverse impact. Fisher’s exact test was then used to consider more formally whether there were statistically significant differences in pass/fail frequencies between groups.
Evidence of Equivalence Between Groups
Two sets of analyses were used to examine for meaningful differences between the demographic groups of interest. First, the Mann–Whitney U tests were used to explore whether there were significant differences in mean total PdPVTS scores between groups. It was hypothesized that there would be no significant differences in mean total score between groups. The Two One-Sided Test (TOST) procedure was then used to determine whether there was evidence of equivalence across groups. The TOST procedure involves determining what constitutes the Smallest Effect Size of Interest (SESOI) and using this effect size to establish upper and lower equivalence bounds. The observed data are then compared against each of the two bounds using two one-sided t-tests: one testing the null hypothesis that the effect is at least as large as the upper bound and the other testing that null hypothesis that the effect is at least as small as the lower bound. If both null hypotheses can be rejected, this demonstrates that the observed effect falls within the equivalence bounds and that the groups of interest are practically equivalent (see Lakens, 2017 and Lakens et al., 2018 for an overview of the TOST approach). All TOST pairs used in the current study employed a SESOI of Cohen’s d = 0.49. This SESOI was selected for multiple reasons, including the fact that a d value of 0.49 corresponds to a half standard deviation, which is consistent with the commonly used Minimal Important Difference criteria (see Copay et al., 2007 for an overview). Moreover, in keeping with the Neyman-Pearson approach, a SESOI of d = 0.49 would balance the risk of type I and type II error by yielding a reasonably high level of statistical power, given the sometimes modest cell sizes used in the current study (Lakens et al., 2018).

Results

Adverse Impact

To enable the adverse impact analyses, first, the mean total PdPVTS scores (see Tables 4, 5, and 6 for descriptive statistics) were generated and compared against the established age-adjusted cut scores (McCaffrey et al., 2020; see Table 7) to establish whether an examinee had passed or failed. The pass rates were then used to create ratios expressing the portion of females passing the PdPVTS relative to males, along with the portions of Black and Hispanic examinees passing the PdPVTS relative to White examinees (see Table 8). As predicted, for all tests in the PdPVTS and for each comparison of interest, the ratios were greater than 0.80, demonstrating there was no evidence of adverse impact associated with the PdPVTS. To confirm that there were no statistically significant differences in pass/fail rates between demographic groups, a series of Fisher’s exact tests were used to compare pass/fail frequencies between groups for each test. Ultimately, none of the Fisher Exact tests were significant (p ranged from 0.12 to 0.99, see Tables 4, 5, and 6).
Table 4
Descriptive statistics, pass rate, and Fisher’s exact test for male vs. female
Test
Gender
Fisher’s exact test (p)
N
Male
Female
n
M
SD
Pass %
n
M
SD
Pass %
Find the Animal
418
209
24.95
0.25
100
209
24.78
1.26
98.1
0.12
Matching
406
203
24.46
1.82
97.5
203
24.42
2.57
97.0
0.99
Shape Learning
406
203
23.93
2.29
97.5
203
24.11
2.27
97.5
0.99
Silhouettes
418
209
24.21
1.61
99.0
209
24.09
1.90
97.1
0.28
Story Questions (ages 7–11)
276
138
16.70
2.09
99.3
138
17.04
1.67
100.0
0.99
Story Questions (ages 12–18)
130
65
19.31
1.69
96.9
65
19.72
1.17
98.5
0.99
Table 5
Descriptive statistics, pass rate, and Fisher’s exact test for Black vs. White
Test
Race/ethnicity
Fisher’s exact test (p)
N
Black
White
n
M
SD
Pass %
n
M
SD
Pass %
Find the Animal
154
77
24.52
2.64
97.4
77
24.97
0.23
98.1
0.49
Matching
152
76
24.36
1.92
96.1
76
24.67
1.00
97.0
0.62
Shape Learning
152
76
23.79
2.49
97.4
76
24.30
1.29
97.5
0.49
Silhouettes
154
77
24.00
2.18
96.1
77
24.00
1.50
97.1
0.62
Story Questions (ages 7–11)
104
52
15.88
2.43
98.1
52
17.06
1.61
100
0.99
Story Questions (ages 12–18)
48
24
19.83
0.48
100
24
19.92
0.28
100
0.99
Table 6
Descriptive statistics, pass rate, and Fisher’s exact test for Hispanic vs. White
Test
Race/ethnicity
Fisher’s exact test (p)
N
Hispanic
White
n
M
SD
Pass %
n
M
SD
Pass %
Find the Animal
274
137
24.93
0.36
99.3
137
24.80
1.42
98.5
0.99
Matching
266
133
24.53
1.69
97.0
133
24.55
2.30
98.5
0.99
Shape Learning
266
133
23.99
2.50
96.2
133
24.07
2.18
97.7
0.72
Silhouettes
274
137
24.06
2.14
96.4
137
24.07
1.74
98.5
0.21
Story Questions (ages 7–11)
178
89
16.89
1.77
97.8
89
16.76
1.90
98.9
0.99
Story Questions (ages 12–18)
88
44
19.02
2.25
95.5
44
19.61
1.42
97.7
0.99
Table 7
Age-adjusted cut scores by test
 
Test
Find the Animal
Matching
Shape Learning
Silhouettes
Story Questions
Age group (years)
Fail
Pass
Fail
Pass
Fail
Pass
Fail
Pass
Fail
Pass
5
0–22
23–25
0–17
18–25
0–15
16–25
0–18
19–25
n/a
n/a
6
0–22
23–25
0–21
22–25
0–15
16–25
0–20
21–25
n/a
n/a
7–11
0–22
23–25
0–21
22–25
0–19
20–25
0–20
21–25
0–12
13–18
12–18
0–22
23–25
0–21
22–25
0–19
20–25
0–20
21–25
0–15
16–20
Table 8
Adverse impact ratios
Test
Group
Male to female
Black to White
Hispanic to White
Find the Animal
0.98
0.99
1.01
Matching
0.99
0.99
0.98
Shape Learning
1.00
1.00
0.98
Silhouettes
0.98
0.99
0.98
Story Questions (ages 7–11)
1.01
0.98
0.99
Story Questions (ages 12–18)
1.02
1.00
0.98

Evidence of Equivalence Between Groups

The Mann–Whitney U tests revealed that when comparing male to female examinees, there was a significant difference in mean total score between groups for the PdPVTS Story Questions among examinees aged 12 to 18 (U = 1799, p = 0.039); however, the effect size was small (r = − 0.18, see Table 9). No other significant differences between males and females were observed. When comparing Black to White examinees, there was a significant difference in mean total score between groups for the Story Questions among examinees age 7 to 11 (U = 972.5, p = 0.008); however, the effect size was small (r = − 0.26, see Table 10). No other significant differences between the Black and White examinees were observed. When comparing Hispanic to White examinees, no significant differences between groups were observed (p ranged from 0.160 to 0.797, see Table 11).
Table 9
Mann–Whitney’s U tests for male vs. female
Test
Gender
Mann–Whitney U
N
Male
Female
n
Mean rank out of N
n
Mean rank out of N
U
Z
p
Effect size (r)
Find the Animal
418
209
213.10
209
205.90
22,593
1.54
0.123
0.08
Matching
406
203
199.38
203
207.62
19,768
 − 1.04
0.300
 − 0.05
Shape Learning
406
203
200.63
203
206.37
20,022.5
 − 0.58
0.560
 − 0.03
Silhouettes
418
209
210.11
209
208.89
21,967.5
0.12
0.905
0.01
Story Questions (ages 7–11)
276
138
133.83
138
143.17
8877
 − 1.10
0.272
 − 0.07
Story Questions (ages 12–18)
130
65
60.68
65
70.32
1799
 − 2.06
0.039
 − 0.18
Table 10
Mann–Whitney’s U tests for Black vs. White
Test
Race/ethnicity
Mann–Whitney U
N
Black
White
n
Mean rank out of N
n
Mean rank out of N
U
Z
p
Effect size (r)
Find the Animal
154
77
75.01
77
79.99
2773
 − 1.92
0.056
 − 0.16
Matching
152
76
73.73
76
79.27
2677.5
 − 1.10
0.271
 − 0.09
Shape Learning
152
76
75.43
76
77.57
2806.5
 − 0.36
0.723
 − 0.03
Silhouettes
154
77
80.97
77
74.03
3232
1.07
0.286
0.09
Story Questions (ages 7–11)
104
52
45.20
52
59.80
972.5
 − 2.65
0.008
 − 0.26
Story Questions (ages 12–18)
48
24
23.96
24
25.04
275
 − 0.51
0.627
 − 0.07
Table 11
Mann–Whitney’s U tests for Hispanic vs. White
Test
Race/ethnicity
Mann–Whitney U
N
Hispanic
White
n
Mean rank out of N
n
Mean rank out of N
U
Z
p
Effect size (r)
Find the Animal
274
137
137.04
137
137.96
9322
 − 0.26
0.797
 − 0.02
Matching
266
133
132.67
133
134.33
8734.5
 − 0.27
0.789
 − 0.02
Shape Learning
266
133
134.86
133
132.14
9025
0.35
0.730
0.02
Silhouettes
274
137
210.11
137
131.63
10,189
1.41
0.160
0.09
Story Questions (ages 7–11)
178
89
90.93
89
88.07
4087.5
0.41
0.684
0.03
Story Questions (ages 12–18)
88
44
41.78
44
47.22
848.5
 − 1.39
0.167
 − 0.15
Given that there was little evidence of differences in mean total score between groups, the TOST procedure was used to consider whether evidence of equivalence between groups could be established. When comparing male to female examinees, evidence of equivalence (i.e., the null hypothesis was rejected for both two one-sided t-tests) was observed for all tests (p < 0.01) except for the Story Questions for examinees aged 12 to 18 (p = 0.124, see Table 12). When comparing Black to White examinees, evidence of equivalence was observed for the Matching, Silhouettes, and Story Questions for examines aged 12 to 18 (p < 0.05, see Table 13). When comparing Hispanic to White examinees, evidence of equivalence was observed for all tests (p < 0.01) except for the Story Questions for examinees aged 12 to 18 (p = 0.207, see Table 14). Ultimately, evidence of equivalence was observed for most of the TOST pairs that were run. Instances where evidence of equivalence was not observed were generally associated with smaller cell sizes and comparatively lower statistical power.
Table 12
Two One-Sided Tests (TOST) exploring equivalence between males vs. females
Test
Gender
Two One-Sided Tests (TOST)
N
Male
Female
TOST upper bound
TOST lower bound
Power (1-β)
Evidence of equivalence
n
n
t
df
p
t
df
p
Find the Animal
418
209
209
6.94
224.92
 < 0.001
 − 3.08
224.92
 < 0.001
0.99
Yes
Matching
406
203
203
5.12
364.14
 < 0.001
 − 4.76
364.14
 < 0.001
0.99
Yes
Shape Learning
406
203
203
4.13
403.97
 < 0.001
 − 5.74
403.97
 < 0.001
0.99
Yes
Silhouettes
418
209
209
5.70
404.67
 < 0.001
 − 4.32
404.67
 < 0.001
0.99
Yes
Story Questions (ages 7–11)
276
138
138
2.57
261.49
0.005
 − 5.57
261.49
 < 0.001
0.98
Yes
Story Questions (ages 12–18)
130
65
65
1.16
113.84
0.124
 − 4.43
113.84
 < 0.001
0.74
No
Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05
Table 13
Two One-Sided Tests (TOST) exploring equivalence between Black vs. White examinees
Test
Race/Ethnicity
Two One-Sided Tests (TOST)
N
Black
White
TOST upper bound
TOST lower bound
Power (1-β)
Evidence of equivalence
n
n
t
df
p
t
df
p
Find the Animal
154
77
77
1.54
77.13
0.064
 − 4.54
77.13
 < 0.001
0.83
No
Matching
152
76
76
1.75
112.69
0.041
 − 4.29
112.69
 < 0.001
0.83
Yes
Shape Learning
152
76
76
1.42
112.38
0.079
 − 4.62
112.38
 < 0.001
0.83
No
Silhouettes
154
77
77
3.04
135.13
 < 0.001
 − 3.04
135.13
 < 0.001
0.83
Yes
Story Questions (ages 7–11)
104
52
52
-0.40
88.65
0.655
 − 5.40
88.65
 < 0.001
0.60
No
Story Questions (ages 12–18)
48
24
24
3.57
37.14
 < 0.001
 − 5.03
37.14
 < 0.001
0.07
Yes
Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05
Table 14
Two One-Sided Tests (TOST) exploring equivalence between Hispanic vs. White examinees
Test
Race/Ethnicity
Two One-Sided Tests (TOST)
N
Hispanic
White
TOST upper bound
TOST lower bound
Power (1-β)
Evidence of equivalence
n
n
t
df
p
t
df
p
Find the Animal
274
137
137
5.05
152.97
 < 0.001
 − 3.07
152.97
0.001
0.98
Yes
Matching
266
133
133
3.90
242.51
 < 0.001
 − 4.09
242.51
 < 0.001
0.98
Yes
Shape Learning
266
133
133
3.73
259.17
 < 0.001
 − 4.26
259.17
 < 0.001
0.98
Yes
Silhouettes
274
137
137
3.99
261.18
 < 0.001
 − 4.12
261.18
 < 0.001
0.98
Yes
Story Questions (ages 7–11)
178
89
89
3.72
175.24
 < 0.001
 − 2.82
175.24
0.003
0.89
Yes
Story Questions (ages 12–18)
88
44
44
0.82
72.57
0.207
 − 3.77
72.57
 < 0.001
0.48
No
Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05

Discussion

When tests are used to classify individuals into specific groups, and especially when some of those classifications can result in adverse consequences for members of certain classifications (e.g., loss of compensation for injuries or denial of disability for those classified and giving suboptimal effort on the PdPVTS), it is critical to assure the obtained classifications are not associated with nominal cultural variables such as gender and race/ethnicity. Fairness in such classifications must be considered empirically. Herein, we have defined fairness in terms of adverse impact via adverse classification as having “failed” a PdPVTS test and examined such failure rates across gender and race/ethnicity. In addition to looking at classification rates and accuracy, we also examined mean score equivalencies.
In every instance, classification/failure rates lacked adverse impact across the nominal variables of gender and race/ethnicity. In the majority of cases, this was accompanied by mean score equivalency across groups but not in all cases. Story Questions, the only verbal measure among the suite of five tests that make up the PdPVTS, did not always demonstrate mean score equivalence. Moreover, evidence of equivalence could not be established for the Find the Animal, Shape Learning, and Story Questions (ages 7–11) tests when comparing Black vs. White examinees. However, the lack of equivalence evidence for the story questions was largely due to the comparatively smaller sample sizes and lower power. The lack of equivalence evidence for some of the comparisons between Black vs. White examinees also appears to stem from a lack of power. The inference that low statistical power was the main reason for which evidence of equivalence could not be established for select comparisons is supported by the observation that all the mean differences between groups and associated effect sizes were quite small and, as such, there was no evidence of adverse impact in the classification of individuals as either passing or failing any of the tests, at any age level. That said, follow-up with larger samples is needed to confirm whether this interpretation is accurate or not. Ultimately, the empirical evidence presented in the current investigation suggests that examiners may use the PdPVTS with confidence the results of the pass/fail classifications will not be associated with gender or race/ethnicity for Black, White, and Hispanic examinees. This lack of adverse impact in achieving such classifications is critically important to all PVTs and in all settings and, when choosing a PVT, examiners should consider the existing evidence related to adverse impact of the classification rates of their chosen instrument across such nominal variables as gender and ethnicity.

Conclusions

As performance validity tests continue to play a more central role in psychoeducational and neuropsychological assessment, there is a clear need to ensure that PVTs represent fair tests, in that classification/failure rates are similar across culturally diverse groups. Previous research has demonstrated that several well-established, freestanding, and embedded PVTs (e.g., the Rey-15 Item Test, Warrington Recognition Memory Test, Rey Word Recognition Test, the Dot Counting Test, the Reliable Digit Span, and Digit Span Age-corrected Scaled score) have higher failure rates associated with different demographic attributes. The current study provides strong evidence to demonstrate that PdPVTS pass/fail classification rates are not associated with gender or race/ethnicity for Black, White, and Hispanic examinees. Moreover, there was strong evidence of equivalency in terms of mean PdPVTS scores between the gender and racial/ethnic groups of interest. That said, evidence of equivalency could not be established for all PdPVTS tests when comparing racial/ethnic groups, albeit owing to comparatively smaller sample sizes for the non-White groups. Therefore, future studies should aim to consider equivalency between racial/ethnic groups in terms of their mean PdPVTS scores using larger, representative samples.

Declarations

Conflict of Interest

The authors of this paper were involved in the development of the PdPVTS and received financial benefits associated with the commercialization of the PdPVTS.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Unsere Produktempfehlungen

e.Med Interdisziplinär

Kombi-Abonnement

Jetzt e.Med zum Sonderpreis bestellen!

Für Ihren Erfolg in Klinik und Praxis - Die beste Hilfe in Ihrem Arbeitsalltag

Mit e.Med Interdisziplinär erhalten Sie Zugang zu allen CME-Fortbildungen und Fachzeitschriften auf SpringerMedizin.de.

Jetzt bestellen und 100 € sparen!

e.Med Psychiatrie

Kombi-Abonnement

Mit e.Med Psychiatrie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes Psychiatrie, den Premium-Inhalten der psychiatrischen Fachzeitschriften, inklusive einer gedruckten Zeitschrift Ihrer Wahl.

e.Med Neurologie

Kombi-Abonnement

Mit e.Med Neurologie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes, den Premium-Inhalten der neurologischen Fachzeitschriften, inklusive einer gedruckten Neurologie-Zeitschrift Ihrer Wahl.

Weitere Produktempfehlungen anzeigen
Literatur
Zurück zum Zitat Biddle, D. (2017). Adverse impact and test validation: A practitioner’s guide to valid and defensible employment testing (2nd ed.). Routledge. CrossRef Biddle, D. (2017). Adverse impact and test validation: A practitioner’s guide to valid and defensible employment testing (2nd ed.). Routledge. CrossRef
Zurück zum Zitat Hood, E. D., Boone, K. B., Miora, D. S., Cottingham, M. E., Victor, T. L., Zeigler, E. A., Zeller, M. A., & Wright, M. J. (2022). Are there differences in performance validity test scores between African American and White American neuropsychology clinic patients? Journal of Clinical and Experimental Neuropsychology, 44, 1, 31–41, 10.1080. Hood, E. D., Boone, K. B., Miora, D. S., Cottingham, M. E., Victor, T. L., Zeigler, E. A., Zeller, M. A., & Wright, M. J. (2022). Are there differences in performance validity test scores between African American and White American neuropsychology clinic patients? Journal of Clinical and Experimental Neuropsychology, 44, 1, 31–41, 10.1080.
Zurück zum Zitat McCaffrey, R. J., Lynch, J. K., Leark, R. A., & Reynolds, C. R. (2020). Pediatric Performance Validity Test Suite: Technical manual. Multi-Health Systems Inc. McCaffrey, R. J., Lynch, J. K., Leark, R. A., & Reynolds, C. R. (2020). Pediatric Performance Validity Test Suite: Technical manual. Multi-Health Systems Inc.
Zurück zum Zitat Newman, D. A., Hanges, P. J., & Outtz, J. L. (2007). Racial groups and test fariness, considering history and contruct validity. American Psychologist, 62(9), 1082–1083.CrossRefPubMed Newman, D. A., Hanges, P. J., & Outtz, J. L. (2007). Racial groups and test fariness, considering history and contruct validity. American Psychologist, 62(9), 1082–1083.CrossRefPubMed
Zurück zum Zitat Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). Mastering modern psychological testing: Theory and methods (2nd ed.). Switzerland.CrossRef Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). Mastering modern psychological testing: Theory and methods (2nd ed.). Switzerland.CrossRef
Zurück zum Zitat Salazar, X. F., Lu, P. H., & Boone, K. B. (2021). The use of performance validity tests in ethnic-minority and non-English-dominant populations. In K. B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective (pp. 578–608). Guilford Press. Salazar, X. F., Lu, P. H., & Boone, K. B. (2021). The use of performance validity tests in ethnic-minority and non-English-dominant populations. In K. B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective (pp. 578–608). Guilford Press.
Zurück zum Zitat Strutt, A. M., & Stinson, J. M. (2022). Performance validity testing with culturally diverse individuals and non-native English speakers. In R. W. Schroeder & P. K. Martin (Eds.), Validity assessment in clinical neuropsychological practice: Evaluating and managing noncredible performance (pp. 211–232). Guilford Press. Strutt, A. M., & Stinson, J. M. (2022). Performance validity testing with culturally diverse individuals and non-native English speakers. In R. W. Schroeder & P. K. Martin (Eds.), Validity assessment in clinical neuropsychological practice: Evaluating and managing noncredible performance (pp. 211–232). Guilford Press.
Zurück zum Zitat Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. L., Rohling, M. L., Boone, K. B., Kirkwood, M. W., Schroeder, R. W., Suhr, J. A., & Participants, C. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036CrossRefPubMed Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. L., Rohling, M. L., Boone, K. B., Kirkwood, M. W., Schroeder, R. W., Suhr, J. A., & Participants, C. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://​doi.​org/​10.​1080/​13854046.​2021.​1896036CrossRefPubMed
Metadaten
Titel
Assessment of Cultural Bias on the PdPVTS Across Gender and Racial/Ethnic Groups
verfasst von
Robert J. McCaffrey
Cecil R. Reynolds
Julie K. Lynch
Robert A. Leark
Robert Ramkhalawansingh
Publikationsdatum
01.03.2023
Verlag
Springer International Publishing
Erschienen in
Journal of Pediatric Neuropsychology / Ausgabe 1/2023
Print ISSN: 2199-2681
Elektronische ISSN: 2199-2673
DOI
https://doi.org/10.1007/s40817-022-00133-1

Weitere Artikel der Ausgabe 1/2023

Journal of Pediatric Neuropsychology 1/2023 Zur Ausgabe

Leitlinien kompakt für die Neurologie

Mit medbee Pocketcards sicher entscheiden.

Seit 2022 gehört die medbee GmbH zum Springer Medizin Verlag

Nicht Creutzfeldt Jakob, sondern Abführtee-Vergiftung

29.05.2024 Hyponatriämie Nachrichten

Eine ältere Frau trinkt regelmäßig Sennesblättertee gegen ihre Verstopfung. Der scheint plötzlich gut zu wirken. Auf Durchfall und Erbrechen folgt allerdings eine Hyponatriämie. Nach deren Korrektur kommt es plötzlich zu progredienten Kognitions- und Verhaltensstörungen.

Schutz der Synapsen bei Alzheimer

29.05.2024 Morbus Alzheimer Nachrichten

Mit einem Neurotrophin-Rezeptor-Modulator lässt sich möglicherweise eine bestehende Alzheimerdemenz etwas abschwächen: Erste Phase-2-Daten deuten auf einen verbesserten Synapsenschutz.

Sozialer Aufstieg verringert Demenzgefahr

24.05.2024 Demenz Nachrichten

Ein hohes soziales Niveau ist mit die beste Versicherung gegen eine Demenz. Noch geringer ist das Demenzrisiko für Menschen, die sozial aufsteigen: Sie gewinnen fast zwei demenzfreie Lebensjahre. Umgekehrt steigt die Demenzgefahr beim sozialen Abstieg.

Hirnblutung unter DOAK und VKA ähnlich bedrohlich

17.05.2024 Direkte orale Antikoagulanzien Nachrichten

Kommt es zu einer nichttraumatischen Hirnblutung, spielt es keine große Rolle, ob die Betroffenen zuvor direkt wirksame orale Antikoagulanzien oder Marcumar bekommen haben: Die Prognose ist ähnlich schlecht.

Update Neurologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.