nach oben

Erschienen in:

Open Access 01.03.2023

Assessment of Cultural Bias on the PdPVTS Across Gender and Racial/Ethnic Groups

verfasst von: Robert J. McCaffrey, Cecil R. Reynolds, Julie K. Lynch, Robert A. Leark, Robert Ramkhalawansingh

Erschienen in: Journal of Pediatric Neuropsychology | Ausgabe 1/2023

Abstract

Performance validity assessment has become a standard component of psychoeducational and neuropsychological evaluations (Sweet et al., 2021) but there is evidence to demonstrate that cultural variables can influence performance validity test outcomes. Given the critical role that performance validity assessment plays in determining the accuracy of neuropsychological and psychoeducational test data, it is imperative that these tests are unbiased across culturally diverse samples. The purpose of the current investigation was to examine the impact of race/ethnicity and gender on the Pediatric Performance Validity Test Suite (PdPVTS; McCaffrey, Lynch, Leark, & Reynolds, 2020). A general population sample of n = 838 examinees was collected and demographically matched subsamples were established to compare gender and racial/ethnic groups while controlling for other confounding demographic variables. Classification/failure rates revealed little evidence of adverse impact across gender and racial/ethnic groups. Mean score equivalency between demographic groups was achieved in most cases. Instances where evidence of equivalence could not be established, namely for the Story Questions, were associated with smaller sample sizes and lower statistical power. Taken together, the current study demonstrates that examiners may use the PdPVTS with confidence that pass/fail classifications will not be associated with gender or race/ethnicity for Black, White, and Hispanic examinees.

The original online version of this article was revised for retrospective open access.

Guest Action Editor Dr. Arthur McNeill Horton oversaw the review and acceptance of this article.

A correction to this article is available online at https://doi.org/10.1007/s40817-022-00139-9.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Test fairness is a fundamental principle of test development and refers to the extent to which a test measures the construct of interest and not characteristics of a specific group. Test fairness is linked to test bias, which is defined as empirically established systematic error in a test score. When the systematic error is attributable to a cultural/ethnic variable, the test is considered culturally biased (Newman et al., 2007; Reynolds et al., 2021).

The potential cultural bias of standard neuropsychological and psychoeducational tests has been an area of discussion and research for decades. As performance validity assessment has become a standard component of psychoeducational and neuropsychological evaluations (Sweet et al., 2021), the study of the possible influence of cultural variables on performance validity tests has been developing. This research, to date, points the relevance of several variables associated with culture for performance validity tests, including proficiency in English, bilingualism vs. monolingualism, educational attainment, and country-of-origin. As examples, non-English-speaking individuals with low educational attainment may have a higher failure rate using standard cut scores for some freestanding performance validity tests, specifically, the Test of Memory Malingering, the Rey-15 Item Memory Test, and b-test. Bilingual individuals, with research primarily examining Spanish–English speakers, may have higher failure rates on the Rey-15 Item Test, Warrington Recognition Memory Test, Rey Word Recognition Test, and Dot Counting Test. The Genuine Memory Impairment Profile from the Word Memory Test has been supported in research involving ethnically and linguistically diverse individuals but not the use of the primary validity indices in isolation. Embedded performance validity tests, the Reliable Digit Span and Digit Span Age-corrected Scaled score, have been found to have higher failure rates using standard cut score with Hispanic and Native American individuals, bilingual individuals (Spanish/English), and individuals with low educational attainment (Salazar et al., 2021; Strutt & Stinson, 2022 for review).

Limited research on the impact of race on performance validity tests has been conducted, with the general findings indicating no difference in the performance of racially diverse samples on many performance validity tests. Hood et al., (2022) found that clinically referred Black and White American adults performed generally comparably across a number of freestanding and embedded performance validity tests. Importantly, the two groups were equated for age and educational level. In a pediatric sample, Bosworth & Dodd (2020), Black, Hispanic, and Asian youth with history of mild traumatic brain injury performed comparably on the Non-Verbal Medical Symptom Validity Test (NV-MSVT). There also was no impact gender on the NV-MSVT.

Performance validity assessment is necessary in determining the accuracy of the neuropsychological and psychoeducational test data and should be a component of every evaluation, including pediatric evaluations (Emhoff et al., 2018; Sweet et al., 2021). It is essential that these are culturally fair tests, and this necessitates validation within culturally diverse samples. The current study examined the impact of race and gender on the Pediatric Performance Validity Test Suite (PdPVTS; McCaffrey et al., 2020). The PdPVTS is comprised of five computer-based tests that were specifically designed to assess performance validity in children and adolescents. The principles of universal design were kept forefront in developing a test battery that is accessible youth of varying physical and cognitive capabilities and utilizes cultural- and gender-neutral test stimuli. PdPVTS performance was examined across racial groups and gender. Considering the attention to cultural variables in developing test stimuli, we expected that racial and gender groups would perform comparably across the PdPVTS.

Methods

Participants

A general population sample of n = 838 examinees was collected. Sampling was stratified across age, gender, race/ethnic group, geographic region, and parental education level (PEL) in order to match the demographic composition of the US population. Data collection took place in the fall of 2017 and thus the demographic composition of the sample was matched to the 2017 US Census data (2017 American Community Survey; United States Census Bureau, 2017). During the development of the PdPVTS, the objective was to start with eight tests and to eliminate tests that did not perform as well to arrive at a final set of five tests that maximize sensitivity and specificity in discerning performance validity. Due to this experimental design, not all examinees received each of the final tests in the PdPVTS and, therefore, the total number of examinees in the general population sample who completed a given PdPVTS test ranged from n = 431 to n = 563.

Examinees were excluded from the sample if they had any uncorrected hearing, visual, or motor impairments that might affect their ability to use a tablet or other touchscreen devices to complete the PdPVTS. Examinees were also excluded if they could not communicate verbally. As this was a general population sample, examinees were excluded if they were diagnosed with any psychological, neurological, behavioral, or learning-related disorders. Examinees were excluded if the examiners observed any anomalies with respect to their interaction with the tablet or any of the tests in the PdPVTS (n = 2).

Given that the primary objective of the current investigation was to compare performance on the PdPVTS between gender and among racial/ethnic groups, demographically matched samples were selected to facilitate each set of comparisons while controlling for confounding demographic variables. To compare male and female respondents, subsamples were created for each test in the PdPVTS that were matched on age group, race/ethnicity, and parental education level. To compare Black respondents to White respondents, and to compare Hispanic respondents to White respondents, subsamples were created for each test that were matched on age group and gender only, due to sample size restrictions. Where possible, the same matched samples were used across multiple tests in the PdPVTS. The demographic composition of the male vs. female matched samples are summarized in Table 1 and the demographic characteristics of the Black vs. White and Hispanic vs. White matched subsamples are summarized in Tables 2 and 3, respectively.

Table 1

Demographics characteristics of the male vs. female matched samples

		PdPVTS test(s)
		Find the Animal, Silhouettes				Matching, Shape Learning				Story Questions 7–11				Story Questions 12 to 18
		Male		Female		Male		Female		Male		Female		Male		Female
		N	%	N	%	N	%	N	%	N	%	N	%	N	%	N	%
Parental education level	No high school diploma	18	8.6	18	8.6	21	10.3	21	10.3	15	10.9	15	10.9	6	9.2	6	9.2
	High school graduate	68	32.5	68	32.5	67	33.0	67	33.0	47	34.1	47	34.1	20	30.8	20	30.8
	Some college/associate’s	63	30.1	63	30.1	63	31.0	63	31.0	42	30.4	42	30.4	21	32.3	21	32.3
	Bachelor’s degree	47	22.5	47	22.5	37	18.2	37	18.2	23	16.7	23	16.7	14	21.5	14	21.5
	Graduate/professional degree	13	6.2	13	6.2	15	7.4	15	7.4	11	8.0	11	8.0	4	6.2	4	6.2
Race/ethnicity	Black	19	9.1	19	9.1	21	10.3	21	10.3	15	10.9	15	10.9	6	9.2	6	9.2
	Hispanic	47	22.5	47	22.5	47	23.2	47	23.2	31	22.5	31	22.5	16	24.6	16	24.6
	White	130	62.2	130	62.2	122	60.1	122	60.1	83	60.1	83	60.1	39	60.0	39	60.0
	Other	13	6.2	13	6.2	13	6.4	13	6.4	9	6.5	9	6.5	4	6.2	4	6.2
Region	Midwest	45	21.5	48	23.0	43	21.2	41	20.2	30	21.7	27	19.6	13	20.0	14	21.5
	Northeast	35	16.7	38	18.2	35	17.2	32	15.8	22	15.9	21	15.2	13	20.0	11	16.9
	South	75	35.9	75	35.9	80	39.4	78	38.4	57	41.3	55	39.9	23	35.4	23	35.4
	West	54	25.8	48	23.0	45	22.2	52	25.6	29	21.0	35	25.4	16	24.6	17	26.2
	Age	Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max= 18		Min = 5, Max = 11		Min = 5, Max = 11		Min = 12, Max = 18		Min = 12, Max = 18
	Age	M = 9.36, SD = 4.18		M = 9.27, SD = 4.09		M = 9.38, SD = 4.05		M = 9.37, SD = 4.06		M = 6.91, SD = 1.87		M = 6.94, SD = 1.91		M = 14.63, SD = 1.81		M = 14.54, SD = 2.12
Total n		209		209		203		203		138		138		65		65

The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests

Table 2

Demographic characteristics of the Black vs. White matched samples

		PdPVTS test(s)
		Find the Animal, Silhouettes				Matching, Shape Learning				Story Question 7–11				Story Question 12 to 18
		Black		White		Black		White		Black		White		Black		White
		N	%	N	%	N	%	N	%	N	%	N	%	N	%	N	%
Gender	Male	39	50.6	39	50.6	35	46.1	35	46.1	24	46.2	24	46.2	11	45.8	11	45.8
Gender	Female	38	49.4	38	49.4	41	53.9	41	53.9	28	53.8	28	53.8	13	54.2	13	54.2
Parental education level	No high school diploma	8	10.4	1	1.3	8	10.5	1	1.3	5	9.6	1	1.9	3	12.5	0	0.0
	High school graduate	24	31.2	27	35.1	27	35.5	27	35.5	19	36.5	19	36.5	8	33.3	8	33.3
	Some college/associate’s	29	37.7	24	31.2	24	31.6	20	26.3	17	32.7	13	25.0	7	29.2	7	29.2
	Bachelor’s degree	13	16.9	24	31.2	10	13.2	22	28.9	6	11.5	14	26.9	4	16.7	8	33.3
	Graduate/professional degree	3	3.9	1	1.3	7	9.2	6	7.9	5	9.6	5	9.6	2	8.3	1	4.2
Region	Midwest	16	20.8	26	33.8	15	19.7	16	21.1	11	21.2	11	21.2	4	16.7	5	20.8
	Northeast	10	13.0	17	22.1	10	13.2	11	14.5	6	11.5	8	15.4	4	16.7	3	12.5
	South	44	57.1	23	29.9	44	57.9	34	44.7	30	57.7	22	42.3	14	58.3	12	50.0
	West	7	9.1	11	14.3	7	9.2	15	19.7	5	9.6	11	21.2	2	8.3	4	16.7
	Age	Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 11		Min = 5, Max = 11		Min = 12, Max = 18		Min = 12, Max = 18
	Age	M = 9.09, SD = 3.97		M = 9.09, SD = 4.34		M = 9.33, SD = 4.05		M = 9.21, SD = 3.98		M = 6.94, SD = 1.89		M = 6.85, SD = 1.88		M = 14.5, SD = 2.19		M = 14.33, SD = 1.95
Total n		77		77		76		76		52		52		24		24

The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests

Table 3

Demographic characteristics of the Hispanic vs. White matched samples

		PdPVTS test(s)
		Find the Animal, Silhouettes				Matching, Shape Learning				Story Question 7–11				Story Question 12 to 18
		Hispanic		White		Hispanic		White		Hispanic		White		Hispanic		White
		N	%	N	%	N	%	N	%	N	%	N	%	N	%	N	%
Gender	Male	69	50.4	69	50.4	66	49.6	66	49.6	47	52.8	47	52.8	19	43.2	19	43.2
Gender	Female	68	49.6	68	49.6	67	50.4	67	50.4	42	47.2	42	47.2	25	56.8	25	56.8
Parental education level	No high school diploma	29	21.2	5	3.6	33	24.8	6	4.5	20	22.5	5	5.6	13	29.5	1	2.3
	High school graduate	47	34.3	49	35.8	41	30.8	42	31.6	28	31.5	29	32.6	13	29.5	13	29.5
	Some college/associate’s	36	26.3	37	27.0	36	27.1	40	30.1	24	27.0	26	29.2	12	27.3	14	31.8
	Bachelor’s degree	20	14.6	41	29.9	17	12.8	37	27.8	11	12.4	23	25.8	6	13.6	14	31.8
	Graduate/professional degree	5	3.6	5	3.6	6	4.5	8	6.0	6	6.7	6	6.7	0	0.0	2	4.5
Region	Midwest	13	9.5	41	29.9	14	10.5	31	23.3	9	10.1	20	22.5	5	11.4	11	25.0
	Northeast	16	11.7	27	19.7	16	12.0	25	18.8	11	12.4	19	21.3	5	11.4	6	13.6
	South	49	35.8	49	35.8	50	37.6	50	37.6	34	38.2	32	36.0	16	36.4	18	40.9
	West	59	43.1	20	14.6	53	39.8	27	20.3	35	39.3	18	20.2	18	40.9	9	20.5
	Age	Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 18		Min = 5, Max = 11		Min = 5, Max = 11		Min = 12, Max = 18		Min = 12, Max = 18
	Age	M = 9.58, SD = 4.17		M = 9.49, SD = 4.20		M = 9.61, SD = 4.05		M = 9.5, SD = 4.02		M = 7.17, SD = 2.05		M = 7.03, SD = 1.96		M = 14.55, SD = 2.14		M = 14.48, SD = 1.95
Total n		137		137		133		133		89		89		44		24

The same matched samples were used for the Find the Animal and Silhouettes tests and the same matched samples were used for the Matching and Shape Learning tests

Materials and Procedure

The PdPVTS was administered on an iOS or Windows tablet device, or on a Windows desktop or laptop computer. Examinees were seated beside the examiner and the examiner guided each examinee through instructions for the practice screens and for each of the formal tests. The PdPVTS comprises four visual and one verbal tests: Find the Animal (visual scanning and classification), Matching (visual recognition), Shape Learning (visual recognition), Silhouettes (visual organization), and Story Questions (verbal recognition). Each test took from 3 to 5 min to administer for a total administration time of no more than 25 min.

Data Analyses

The aim of the present study was to examine the fairness of the PdPVTS by considering whether there were differences in PdPVTS outcomes (pass vs. fail) and PdPVTS scores (mean total score) across gender and racial/ethnic groups. Two sets of analyses were used to examine differences in PdPVTS outcomes and scores between groups: an adverse impact approach was used to consider whether there were differences in the rates of PdPVTS outcomes between groups and an equivalence testing approach was used to explore differences in PdPVTS total scores and to establish evidence related to any meaningful differences between groups.

Adverse Impact

An adverse impact analysis (for an overview, see Biddle, 2017) was used to consider the rates at which male vs. female, Black vs. White, and Hispanic vs. White examinees passed the PdPVTS. Adverse impact analysis is typically used in contexts where examinees are being selected for specific opportunities (e.g., employment, housing, resource allocation) and there is a desire to understand whether a legally protected or minority group is being selected at a meaningfully lower rate than a non-protected group. The test or criteria being used to facilitate selection is said to have an adverse impact when it leads to a protected group being selected at a rate of 80% or lower than that of the non-protected group (i.e., the 4/5 rule; see Hough et al., 2001 for an overview). In the current study, we considered the ratio of examinees passing the PdPVTS to determine whether there were any meaningful differences in the rates at which different gender and racial/ethnic groups passed the PdPVTS and whether the PdPVTS could have an adverse impact. Fisher’s exact test was then used to consider more formally whether there were statistically significant differences in pass/fail frequencies between groups.

Evidence of Equivalence Between Groups

Two sets of analyses were used to examine for meaningful differences between the demographic groups of interest. First, the Mann–Whitney U tests were used to explore whether there were significant differences in mean total PdPVTS scores between groups. It was hypothesized that there would be no significant differences in mean total score between groups. The Two One-Sided Test (TOST) procedure was then used to determine whether there was evidence of equivalence across groups. The TOST procedure involves determining what constitutes the Smallest Effect Size of Interest (SESOI) and using this effect size to establish upper and lower equivalence bounds. The observed data are then compared against each of the two bounds using two one-sided t-tests: one testing the null hypothesis that the effect is at least as large as the upper bound and the other testing that null hypothesis that the effect is at least as small as the lower bound. If both null hypotheses can be rejected, this demonstrates that the observed effect falls within the equivalence bounds and that the groups of interest are practically equivalent (see Lakens, 2017 and Lakens et al., 2018 for an overview of the TOST approach). All TOST pairs used in the current study employed a SESOI of Cohen’s d = 0.49. This SESOI was selected for multiple reasons, including the fact that a d value of 0.49 corresponds to a half standard deviation, which is consistent with the commonly used Minimal Important Difference criteria (see Copay et al., 2007 for an overview). Moreover, in keeping with the Neyman-Pearson approach, a SESOI of d = 0.49 would balance the risk of type I and type II error by yielding a reasonably high level of statistical power, given the sometimes modest cell sizes used in the current study (Lakens et al., 2018).

Results

Adverse Impact

To enable the adverse impact analyses, first, the mean total PdPVTS scores (see Tables 4, 5, and 6 for descriptive statistics) were generated and compared against the established age-adjusted cut scores (McCaffrey et al., 2020; see Table 7) to establish whether an examinee had passed or failed. The pass rates were then used to create ratios expressing the portion of females passing the PdPVTS relative to males, along with the portions of Black and Hispanic examinees passing the PdPVTS relative to White examinees (see Table 8). As predicted, for all tests in the PdPVTS and for each comparison of interest, the ratios were greater than 0.80, demonstrating there was no evidence of adverse impact associated with the PdPVTS. To confirm that there were no statistically significant differences in pass/fail rates between demographic groups, a series of Fisher’s exact tests were used to compare pass/fail frequencies between groups for each test. Ultimately, none of the Fisher Exact tests were significant (p ranged from 0.12 to 0.99, see Tables 4, 5, and 6).

Table 4

Descriptive statistics, pass rate, and Fisher’s exact test for male vs. female

Test	Gender									Fisher’s exact test (p)
	N	Male				Female
	N	n	M	SD	Pass %	n	M	SD	Pass %
Find the Animal	418	209	24.95	0.25	100	209	24.78	1.26	98.1	0.12
Matching	406	203	24.46	1.82	97.5	203	24.42	2.57	97.0	0.99
Shape Learning	406	203	23.93	2.29	97.5	203	24.11	2.27	97.5	0.99
Silhouettes	418	209	24.21	1.61	99.0	209	24.09	1.90	97.1	0.28
Story Questions (ages 7–11)	276	138	16.70	2.09	99.3	138	17.04	1.67	100.0	0.99
Story Questions (ages 12–18)	130	65	19.31	1.69	96.9	65	19.72	1.17	98.5	0.99

Table 5

Descriptive statistics, pass rate, and Fisher’s exact test for Black vs. White

Test	Race/ethnicity									Fisher’s exact test (p)
	N	Black				White
	N	n	M	SD	Pass %	n	M	SD	Pass %
Find the Animal	154	77	24.52	2.64	97.4	77	24.97	0.23	98.1	0.49
Matching	152	76	24.36	1.92	96.1	76	24.67	1.00	97.0	0.62
Shape Learning	152	76	23.79	2.49	97.4	76	24.30	1.29	97.5	0.49
Silhouettes	154	77	24.00	2.18	96.1	77	24.00	1.50	97.1	0.62
Story Questions (ages 7–11)	104	52	15.88	2.43	98.1	52	17.06	1.61	100	0.99
Story Questions (ages 12–18)	48	24	19.83	0.48	100	24	19.92	0.28	100	0.99

Table 6

Descriptive statistics, pass rate, and Fisher’s exact test for Hispanic vs. White

Test	Race/ethnicity									Fisher’s exact test (p)
	N	Hispanic				White
	N	n	M	SD	Pass %	n	M	SD	Pass %
Find the Animal	274	137	24.93	0.36	99.3	137	24.80	1.42	98.5	0.99
Matching	266	133	24.53	1.69	97.0	133	24.55	2.30	98.5	0.99
Shape Learning	266	133	23.99	2.50	96.2	133	24.07	2.18	97.7	0.72
Silhouettes	274	137	24.06	2.14	96.4	137	24.07	1.74	98.5	0.21
Story Questions (ages 7–11)	178	89	16.89	1.77	97.8	89	16.76	1.90	98.9	0.99
Story Questions (ages 12–18)	88	44	19.02	2.25	95.5	44	19.61	1.42	97.7	0.99

Table 7

Age-adjusted cut scores by test

	Test
	Find the Animal		Matching		Shape Learning		Silhouettes		Story Questions
Age group (years)	Fail	Pass	Fail	Pass	Fail	Pass	Fail	Pass	Fail	Pass
5	0–22	23–25	0–17	18–25	0–15	16–25	0–18	19–25	n/a	n/a
6	0–22	23–25	0–21	22–25	0–15	16–25	0–20	21–25	n/a	n/a
7–11	0–22	23–25	0–21	22–25	0–19	20–25	0–20	21–25	0–12	13–18
12–18	0–22	23–25	0–21	22–25	0–19	20–25	0–20	21–25	0–15	16–20

Table 8

Adverse impact ratios

Test	Group
Test	Male to female	Black to White	Hispanic to White
Find the Animal	0.98	0.99	1.01
Matching	0.99	0.99	0.98
Shape Learning	1.00	1.00	0.98
Silhouettes	0.98	0.99	0.98
Story Questions (ages 7–11)	1.01	0.98	0.99
Story Questions (ages 12–18)	1.02	1.00	0.98

Evidence of Equivalence Between Groups

The Mann–Whitney U tests revealed that when comparing male to female examinees, there was a significant difference in mean total score between groups for the PdPVTS Story Questions among examinees aged 12 to 18 (U = 1799, p = 0.039); however, the effect size was small (r = − 0.18, see Table 9). No other significant differences between males and females were observed. When comparing Black to White examinees, there was a significant difference in mean total score between groups for the Story Questions among examinees age 7 to 11 (U = 972.5, p = 0.008); however, the effect size was small (r = − 0.26, see Table 10). No other significant differences between the Black and White examinees were observed. When comparing Hispanic to White examinees, no significant differences between groups were observed (p ranged from 0.160 to 0.797, see Table 11).

Table 9

Mann–Whitney’s U tests for male vs. female

Test	Gender					Mann–Whitney U
	N	Male		Female		Mann–Whitney U
	N	n	Mean rank out of N	n	Mean rank out of N	U	Z	p	Effect size (r)
Find the Animal	418	209	213.10	209	205.90	22,593	1.54	0.123	0.08
Matching	406	203	199.38	203	207.62	19,768	− 1.04	0.300	− 0.05
Shape Learning	406	203	200.63	203	206.37	20,022.5	− 0.58	0.560	− 0.03
Silhouettes	418	209	210.11	209	208.89	21,967.5	0.12	0.905	0.01
Story Questions (ages 7–11)	276	138	133.83	138	143.17	8877	− 1.10	0.272	− 0.07
Story Questions (ages 12–18)	130	65	60.68	65	70.32	1799	− 2.06	0.039	− 0.18

Table 10

Mann–Whitney’s U tests for Black vs. White

Test	Race/ethnicity					Mann–Whitney U
	N	Black		White		Mann–Whitney U
	N	n	Mean rank out of N	n	Mean rank out of N	U	Z	p	Effect size (r)
Find the Animal	154	77	75.01	77	79.99	2773	− 1.92	0.056	− 0.16
Matching	152	76	73.73	76	79.27	2677.5	− 1.10	0.271	− 0.09
Shape Learning	152	76	75.43	76	77.57	2806.5	− 0.36	0.723	− 0.03
Silhouettes	154	77	80.97	77	74.03	3232	1.07	0.286	0.09
Story Questions (ages 7–11)	104	52	45.20	52	59.80	972.5	− 2.65	0.008	− 0.26
Story Questions (ages 12–18)	48	24	23.96	24	25.04	275	− 0.51	0.627	− 0.07

Table 11

Mann–Whitney’s U tests for Hispanic vs. White

Test	Race/ethnicity					Mann–Whitney U
	N	Hispanic		White		Mann–Whitney U
	N	n	Mean rank out of N	n	Mean rank out of N	U	Z	p	Effect size (r)
Find the Animal	274	137	137.04	137	137.96	9322	− 0.26	0.797	− 0.02
Matching	266	133	132.67	133	134.33	8734.5	− 0.27	0.789	− 0.02
Shape Learning	266	133	134.86	133	132.14	9025	0.35	0.730	0.02
Silhouettes	274	137	210.11	137	131.63	10,189	1.41	0.160	0.09
Story Questions (ages 7–11)	178	89	90.93	89	88.07	4087.5	0.41	0.684	0.03
Story Questions (ages 12–18)	88	44	41.78	44	47.22	848.5	− 1.39	0.167	− 0.15

Given that there was little evidence of differences in mean total score between groups, the TOST procedure was used to consider whether evidence of equivalence between groups could be established. When comparing male to female examinees, evidence of equivalence (i.e., the null hypothesis was rejected for both two one-sided t-tests) was observed for all tests (p < 0.01) except for the Story Questions for examinees aged 12 to 18 (p = 0.124, see Table 12). When comparing Black to White examinees, evidence of equivalence was observed for the Matching, Silhouettes, and Story Questions for examines aged 12 to 18 (p < 0.05, see Table 13). When comparing Hispanic to White examinees, evidence of equivalence was observed for all tests (p < 0.01) except for the Story Questions for examinees aged 12 to 18 (p = 0.207, see Table 14). Ultimately, evidence of equivalence was observed for most of the TOST pairs that were run. Instances where evidence of equivalence was not observed were generally associated with smaller cell sizes and comparatively lower statistical power.

Table 12

Two One-Sided Tests (TOST) exploring equivalence between males vs. females

Test	Gender			Two One-Sided Tests (TOST)
	N	Male	Female	TOST upper bound			TOST lower bound			Power (1-β)	Evidence of equivalence
	N	n	n	t	df	p	t	df	p	Power (1-β)	Evidence of equivalence
Find the Animal	418	209	209	6.94	224.92	< 0.001	− 3.08	224.92	< 0.001	0.99	Yes
Matching	406	203	203	5.12	364.14	< 0.001	− 4.76	364.14	< 0.001	0.99	Yes
Shape Learning	406	203	203	4.13	403.97	< 0.001	− 5.74	403.97	< 0.001	0.99	Yes
Silhouettes	418	209	209	5.70	404.67	< 0.001	− 4.32	404.67	< 0.001	0.99	Yes
Story Questions (ages 7–11)	276	138	138	2.57	261.49	0.005	− 5.57	261.49	< 0.001	0.98	Yes
Story Questions (ages 12–18)	130	65	65	1.16	113.84	0.124	− 4.43	113.84	< 0.001	0.74	No

Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05

Table 13

Two One-Sided Tests (TOST) exploring equivalence between Black vs. White examinees

Test	Race/Ethnicity			Two One-Sided Tests (TOST)
	N	Black	White	TOST upper bound			TOST lower bound			Power (1-β)	Evidence of equivalence
	N	n	n	t	df	p	t	df	p	Power (1-β)	Evidence of equivalence
Find the Animal	154	77	77	1.54	77.13	0.064	− 4.54	77.13	< 0.001	0.83	No
Matching	152	76	76	1.75	112.69	0.041	− 4.29	112.69	< 0.001	0.83	Yes
Shape Learning	152	76	76	1.42	112.38	0.079	− 4.62	112.38	< 0.001	0.83	No
Silhouettes	154	77	77	3.04	135.13	< 0.001	− 3.04	135.13	< 0.001	0.83	Yes
Story Questions (ages 7–11)	104	52	52	-0.40	88.65	0.655	− 5.40	88.65	< 0.001	0.60	No
Story Questions (ages 12–18)	48	24	24	3.57	37.14	< 0.001	− 5.03	37.14	< 0.001	0.07	Yes

Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05

Table 14

Two One-Sided Tests (TOST) exploring equivalence between Hispanic vs. White examinees

Test	Race/Ethnicity			Two One-Sided Tests (TOST)
	N	Hispanic	White	TOST upper bound			TOST lower bound			Power (1-β)	Evidence of equivalence
	N	n	n	t	df	p	t	df	p	Power (1-β)	Evidence of equivalence
Find the Animal	274	137	137	5.05	152.97	< 0.001	− 3.07	152.97	0.001	0.98	Yes
Matching	266	133	133	3.90	242.51	< 0.001	− 4.09	242.51	< 0.001	0.98	Yes
Shape Learning	266	133	133	3.73	259.17	< 0.001	− 4.26	259.17	< 0.001	0.98	Yes
Silhouettes	274	137	137	3.99	261.18	< 0.001	− 4.12	261.18	< 0.001	0.98	Yes
Story Questions (ages 7–11)	178	89	89	3.72	175.24	< 0.001	− 2.82	175.24	0.003	0.89	Yes
Story Questions (ages 12–18)	88	44	44	0.82	72.57	0.207	− 3.77	72.57	< 0.001	0.48	No

Evidence of equivalence is present when both the upper bound and lower bound t-tests are significant at p < 0.05

Discussion

When tests are used to classify individuals into specific groups, and especially when some of those classifications can result in adverse consequences for members of certain classifications (e.g., loss of compensation for injuries or denial of disability for those classified and giving suboptimal effort on the PdPVTS), it is critical to assure the obtained classifications are not associated with nominal cultural variables such as gender and race/ethnicity. Fairness in such classifications must be considered empirically. Herein, we have defined fairness in terms of adverse impact via adverse classification as having “failed” a PdPVTS test and examined such failure rates across gender and race/ethnicity. In addition to looking at classification rates and accuracy, we also examined mean score equivalencies.

In every instance, classification/failure rates lacked adverse impact across the nominal variables of gender and race/ethnicity. In the majority of cases, this was accompanied by mean score equivalency across groups but not in all cases. Story Questions, the only verbal measure among the suite of five tests that make up the PdPVTS, did not always demonstrate mean score equivalence. Moreover, evidence of equivalence could not be established for the Find the Animal, Shape Learning, and Story Questions (ages 7–11) tests when comparing Black vs. White examinees. However, the lack of equivalence evidence for the story questions was largely due to the comparatively smaller sample sizes and lower power. The lack of equivalence evidence for some of the comparisons between Black vs. White examinees also appears to stem from a lack of power. The inference that low statistical power was the main reason for which evidence of equivalence could not be established for select comparisons is supported by the observation that all the mean differences between groups and associated effect sizes were quite small and, as such, there was no evidence of adverse impact in the classification of individuals as either passing or failing any of the tests, at any age level. That said, follow-up with larger samples is needed to confirm whether this interpretation is accurate or not. Ultimately, the empirical evidence presented in the current investigation suggests that examiners may use the PdPVTS with confidence the results of the pass/fail classifications will not be associated with gender or race/ethnicity for Black, White, and Hispanic examinees. This lack of adverse impact in achieving such classifications is critically important to all PVTs and in all settings and, when choosing a PVT, examiners should consider the existing evidence related to adverse impact of the classification rates of their chosen instrument across such nominal variables as gender and ethnicity.

Conclusions

As performance validity tests continue to play a more central role in psychoeducational and neuropsychological assessment, there is a clear need to ensure that PVTs represent fair tests, in that classification/failure rates are similar across culturally diverse groups. Previous research has demonstrated that several well-established, freestanding, and embedded PVTs (e.g., the Rey-15 Item Test, Warrington Recognition Memory Test, Rey Word Recognition Test, the Dot Counting Test, the Reliable Digit Span, and Digit Span Age-corrected Scaled score) have higher failure rates associated with different demographic attributes. The current study provides strong evidence to demonstrate that PdPVTS pass/fail classification rates are not associated with gender or race/ethnicity for Black, White, and Hispanic examinees. Moreover, there was strong evidence of equivalency in terms of mean PdPVTS scores between the gender and racial/ethnic groups of interest. That said, evidence of equivalency could not be established for all PdPVTS tests when comparing racial/ethnic groups, albeit owing to comparatively smaller sample sizes for the non-White groups. Therefore, future studies should aim to consider equivalency between racial/ethnic groups in terms of their mean PdPVTS scores using larger, representative samples.

Declarations

Conflict of Interest

The authors of this paper were involved in the development of the PdPVTS and received financial benefits associated with the commercialization of the PdPVTS.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Unsere Produktempfehlungen

e.Med Interdisziplinär

Kombi-Abonnement

Jetzt e.Med zum Sonderpreis bestellen!

Für Ihren Erfolg in Klinik und Praxis - Die beste Hilfe in Ihrem Arbeitsalltag

Mit e.Med Interdisziplinär erhalten Sie Zugang zu allen CME-Fortbildungen und Fachzeitschriften auf SpringerMedizin.de.

Jetzt bestellen und 100 € sparen!

Jetzt testen ¹

e.Med Psychiatrie

Kombi-Abonnement

Mit e.Med Psychiatrie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes Psychiatrie, den Premium-Inhalten der psychiatrischen Fachzeitschriften, inklusive einer gedruckten Zeitschrift Ihrer Wahl.

Jetzt testen ²

e.Med Neurologie

Kombi-Abonnement

Mit e.Med Neurologie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes, den Premium-Inhalten der neurologischen Fachzeitschriften, inklusive einer gedruckten Neurologie-Zeitschrift Ihrer Wahl.

Jetzt testen ³

e.Med Neurologie & Psychiatrie

Kombi-Abonnement

Mit e.Med Neurologie & Psychiatrie erhalten Sie Zugang zu CME-Fortbildungen der Fachgebiete, den Premium-Inhalten der dazugehörigen Fachzeitschriften, inklusive einer gedruckten Zeitschrift Ihrer Wahl.

Jetzt testen ⁴

e.Med Pädiatrie

Kombi-Abonnement

Mit e.Med Pädiatrie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes Pädiatrie, den Premium-Inhalten der pädiatrischen Fachzeitschriften, inklusive einer gedruckten Pädiatrie-Zeitschrift Ihrer Wahl.

Jetzt testen ⁵

Biddle, D. (2017). Adverse impact and test validation: A practitioner’s guide to valid and defensible employment testing (2nd ed.). Routledge. CrossRef

Bosworth, C., & Dodd, J. N. (2020). Noncredible effort on the Nonverbal-Medical Symptom Validity Test (NV-MSVT): Impact on cognitive performance in pediatric mild traumatic brain injury. Applied Neuropsychology: Child, 9(4), 367–374. https://doi.org/10.1080/216229645.2020.1742717CrossRefPubMed

Copay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., Jr., & Schuler, T. C. (2007). Understanding the minimum clinically important difference: A review of concepts and methods. The Spine Journal: Official Journal of the North American Spine Society, 7(5), 541–546. https://doi.org/10.1016/j.spinee.2007.01.008CrossRefPubMed

Emhoff, S. M., Lynch, J. K., & McCaffrey, R. J. (2018). Performance and symptom validity testing in pediatric assessment: A review of the literature. Developmental Neuropsychology, 43(8), 671–707. https://doi.org/10.1080/87565641.2018.1525612CrossRefPubMed

Hood, E. D., Boone, K. B., Miora, D. S., Cottingham, M. E., Victor, T. L., Zeigler, E. A., Zeller, M. A., & Wright, M. J. (2022). Are there differences in performance validity test scores between African American and White American neuropsychology clinic patients? Journal of Clinical and Experimental Neuropsychology, 44, 1, 31–41, 10.1080.

Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detection and amelioration of adverse impact in personnel selection procedures: Issues, evidence and lessons learned. International Journal of Selection and Assessment, 9(1–2), 152–194. https://doi.org/10.1111/1468-2389.00171CrossRef

Lakens, D. (2017). Equivalence tests. Social Psychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177CrossRefPubMedPubMedCentral

Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963CrossRef

McCaffrey, R. J., Lynch, J. K., Leark, R. A., & Reynolds, C. R. (2020). Pediatric Performance Validity Test Suite: Technical manual. Multi-Health Systems Inc.

Newman, D. A., Hanges, P. J., & Outtz, J. L. (2007). Racial groups and test fariness, considering history and contruct validity. American Psychologist, 62(9), 1082–1083.CrossRefPubMed

Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). Mastering modern psychological testing: Theory and methods (2nd ed.). Switzerland.CrossRef

Salazar, X. F., Lu, P. H., & Boone, K. B. (2021). The use of performance validity tests in ethnic-minority and non-English-dominant populations. In K. B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective (pp. 578–608). Guilford Press.

Strutt, A. M., & Stinson, J. M. (2022). Performance validity testing with culturally diverse individuals and non-native English speakers. In R. W. Schroeder & P. K. Martin (Eds.), Validity assessment in clinical neuropsychological practice: Evaluating and managing noncredible performance (pp. 211–232). Guilford Press.

Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. L., Rohling, M. L., Boone, K. B., Kirkwood, M. W., Schroeder, R. W., Suhr, J. A., & Participants, C. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036CrossRefPubMed

Titel: Assessment of Cultural Bias on the PdPVTS Across Gender and Racial/Ethnic Groups
verfasst von: Robert J. McCaffrey
Cecil R. Reynolds
Julie K. Lynch
Robert A. Leark
Robert Ramkhalawansingh
Publikationsdatum: 01.03.2023
Verlag: Springer International Publishing
Erschienen in: Journal of Pediatric Neuropsychology / Ausgabe 1/2023
Print ISSN: 2199-2681
Elektronische ISSN: 2199-2673
DOI: https://doi.org/10.1007/s40817-022-00133-1

Leitlinien kompakt für die Neurologie

Mit medbee Pocketcards sicher entscheiden.

^{Seit 2022 gehört die medbee GmbH zum Springer Medizin Verlag}

Kostenlos registrieren

Neu im Fachgebiet Neurologie

Nicht Creutzfeldt Jakob, sondern Abführtee-Vergiftung

29.05.2024 Hyponatriämie Nachrichten

Eine ältere Frau trinkt regelmäßig Sennesblättertee gegen ihre Verstopfung. Der scheint plötzlich gut zu wirken. Auf Durchfall und Erbrechen folgt allerdings eine Hyponatriämie. Nach deren Korrektur kommt es plötzlich zu progredienten Kognitions- und Verhaltensstörungen.

Schutz der Synapsen bei Alzheimer

29.05.2024 Morbus Alzheimer Nachrichten

Mit einem Neurotrophin-Rezeptor-Modulator lässt sich möglicherweise eine bestehende Alzheimerdemenz etwas abschwächen: Erste Phase-2-Daten deuten auf einen verbesserten Synapsenschutz.

Sozialer Aufstieg verringert Demenzgefahr

24.05.2024 Demenz Nachrichten

Ein hohes soziales Niveau ist mit die beste Versicherung gegen eine Demenz. Noch geringer ist das Demenzrisiko für Menschen, die sozial aufsteigen: Sie gewinnen fast zwei demenzfreie Lebensjahre. Umgekehrt steigt die Demenzgefahr beim sozialen Abstieg.

Hirnblutung unter DOAK und VKA ähnlich bedrohlich

17.05.2024 Direkte orale Antikoagulanzien Nachrichten

Kommt es zu einer nichttraumatischen Hirnblutung, spielt es keine große Rolle, ob die Betroffenen zuvor direkt wirksame orale Antikoagulanzien oder Marcumar bekommen haben: Die Prognose ist ähnlich schlecht.

Update Neurologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen

Live-Webinar "Urologie und Sexualmedizin in der Praxis"

Springer Medizin

Abstract

Publisher's Note

Introduction

Methods

Participants

Materials and Procedure

Data Analyses

Results

Adverse Impact

Evidence of Equivalence Between Groups

Discussion

Conclusions

Declarations

Conflict of Interest

Publisher's Note

Unsere Produktempfehlungen

e.Med Interdisziplinär

e.Med Psychiatrie

e.Med Neurologie

e.Med Neurologie & Psychiatrie

e.Med Pädiatrie

Weitere Artikel der Ausgabe 1/2023

Adverse Childhood Experiences and Binge Drinking in Adolescence: the Role of Impulsivity and PTSD Symptoms

Neurodevelopmental Trajectory in a Child with Congenital Heart Disease

Correction: Assessment of Cultural Bias on the PdPVTS Across Gender and Racial/Ethnic Groups

Detecting Unusual Score Patterns in the Context of Relevant Predictors

Leitlinien kompakt für die Neurologie

Neu im Fachgebiet Neurologie

Nicht Creutzfeldt Jakob, sondern Abführtee-Vergiftung

Schutz der Synapsen bei Alzheimer

Sozialer Aufstieg verringert Demenzgefahr

Hirnblutung unter DOAK und VKA ähnlich bedrohlich

Update Neurologie