چکیده:
Validation is an important enterprise especially when a test is a high stakes one. Demographic variables like gender and field of study can affect test results and interpretations. Differential Item Functioning (DIF) is a way to make sure that a test does not favor one group of test takers over the others. This study investigated DIF in terms of gender in the reading comprehension subtest (35 items) of a high stakes test using a three-step logistic regression procedure (Zumbo, 1999). The participants of the study were 3,398 test takers, both males and females, who took the test in question (the UTEPT) as a partial requirement for entering a PhD program at the University of Tehran. To show whether the 35 items of the reading comprehension part exhibited DIF or not, logistic regression using a three step procedure (Zumbo, 1999) was employed. Three sets of criteria of Cohen’s (1988), Zumbo’s (1999), and Jodin and Girel’s (2001) were selected. It was revealed that, though the 35 items show "small" effect sizes according to Cohen’s classification, they do not display DIF based on the other two criteria. Therefore, it can be concluded that the reading comprehension subtest of the UTEPT favors neither males nor females.
خلاصه ماشینی:
Differential Item Functioning (DIF) in Terms of Gender in the Reading Comprehension Subtest of a High-Stakes Test Mohammad Salehi Alireza Tayebi Assisstant Professor, Sharif University of Technology M.
This study investigated DIF in terms of gender in the reading comprehension subtest (35 items) of a high stakes test using a three-step logistic regression procedure (Zumbo, 1999).
The Research Question With regard to the very nature of the study, the following research question is put forward: - Do the items in the reading comprehension section of University of Tehran English Proficiency Test (the UTEPT) exhibit DIF with regard to the gender of the participants?
However, according to McNamara and Roever (2006), the following four broad categories of methods are used for detecting DIF: (a) analysis based on item difficulty (comparing item difficulty estimates); (b) nonparametric approaches (procedures using contingency tables, Chi-square, and odd ratios); (c) item-response-theory-based approaches (approaches including one, two, and three-parameter analyses which frequently compare the fit of statistical models); and (d) other approaches (including logistic regression, generalizability theory, and multifaceted measurement).
Other DIF studies include Geranpayeh and Kunnan (2007) who employed DIF procedure in terms of age to investigate whether the test items on the listening section of the Certificate in Advanced English examination function differently for test takers in different age groups.
In addition, Park (2006), used a three-step logistic regression procedure for ordinal items to investigate DIF of ten writing prompts from the writing subtest of Michigan English Language Assessment Battery (MELAB) and found that the effect sizes were far too small for few prompts (i.