首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Data from a large-scale performance assessment ( N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood ratio test, the Mantel test, and the generalized Mantel–Haenszel test were compared. Results indicated some agreement among the five DIF detection methods. Because statistical power is a function of the sample size, the DIF detection results from extremely large data sets are not practically useful. As alternatives to the DIF detection methods, four IRT model-based indices of standardized impact and four observed-score indices of standardized impact for polytomous items were obtained and compared with the R 2 measures of logistic regression.  相似文献   

2.
A computer simulation study was conducted to determine the feasibility of using logistic regression procedures to detect differential item functioning (DIF) in polytomous items. One item in a simulated test of 25 items contained DIF; parameters' for that item were varied to create three conditions of nonuniform DIF and one of uniform DIF. Item scores were generated using a generalized partial credit model, and the data were recoded into multiple dichotomies in order to use logistic regression procedures. Results indicate that logistic regression is powerful in detecting most forms of DIF; however, it required large amounts of data manipulation, and interpretation of the results was sometimes difficult. Some logistic regression procedures may be useful in the post hoc analysis of DlF for polytomous items.  相似文献   

3.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model. The power is related to the item response function (IRF) for the studied item, the latent trait distributions, and the sample sizes for the reference and focal groups. Simulation studies show that the theoretical values calculated from the formulas derived in the article are close to what are observed in the simulated data when the assumptions are satisfied. The robustness of the power formulas are studied with simulations when the assumptions are violated.  相似文献   

4.
ABSTRACT

This study examined the effect of similar vs. dissimilar proficiency distributions on uniform DIF detection on a statewide eighth grade mathematics assessment. Results from the similar- and dissimilar-ability reference groups with an SWD focal group were compared for four models: logistic regression, hierarchical generalized linear model (HGLM), the Wald-1 IRT-based test, and the Mantel-Haenszel procedure. A DIF-free-then-DIF strategy was used. The rate of DIF detection was examined among all accommodated scores and common accommodation subcategories. No items were detected for DIF using the similar ability distribution reference group, regardless of method. With the dissimilar ability reference group, logistic regression and Mantel–Haenszel flagged 8–17%, and the Wald-1 and HGLM test flagged 23–38% of items for DIF. Forming focal groups by accommodation type did not alter the pattern of DIF detection. Creating a reference group to be similar in ability to the focal group may control the rate of erroneous DIF detection for SWD.  相似文献   

5.
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated using a hierarchical framework, such as might be seen when examinees are clustered in schools, for example. Both standard and hierarchical LR (accounting for multilevel data) approaches to DIF detection were employed. Results highlight the differences in DIF detection rates when the analytic strategy matches the data structure. Specifically, when the grouping variable was within clusters, LR and HLR performed similarly in terms of Type I error control and power. However, when the grouping variable was between clusters, LR failed to maintain the nominal Type I error rate of .05. HLR was able to maintain this rate. However, power for HLR tended to be low under many conditions in the between cluster variable case.  相似文献   

6.
Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present. Traditional nonparametric tests of DIF (Mantel-Haenszel, SIBTEST) are not designed to test for the presence of nonuniform or local DIF, while common probability difference (P-DIF) tests (e.g., SIBTEST) do not optimize power in testing for uniform DIF, and thus may be less useful in the context of graphical DIF analyses. In this article, modifications of three alternative nonparametric statistical tests for DIF, Fisher's χ 2 test, Cochran's Z test, and Goodman's U test ( Marascuilo & Slaughter, 1981 ), are investigated for these purposes. A simulation study demonstrates the effectiveness of a regression correction procedure in improving the statistical performance of the tests when using an internal test score as the matching criterion. Simulation power and real data analyses demonstrate the unique information provided by these alternative methods compared to SIBTEST and Mantel-Haenszel in confirming various forms of DIF in translated tests.  相似文献   

7.
A logistic regression model for characterizing differential item functioning (DIF) between two groups is presented. A distinction is drawn between uniform and nonuniform DIF in terms of the parameters of the model. A statistic for testing the hypothesis of no DIF is developed. Through simulation studies, it is shown that the logistic regression procedure is more powerful than the Mantel-Haenszel procedure for detecting nonuniform DIF and as powerful in detecting uniform DIF.  相似文献   

8.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

9.
The TOEFL® iBT has increased the length of each reading passage to better approximate academic reading at North American universities, resulting in a reduction in the number of passages on the reading section of the test. One of the concerns brought about by this change is whether the decrease in topic variety increases the likelihood that an examinee's familiarity with the content of a given passage will influence the examinee's reading performance. This study investigated differential item functioning and differential bundle functioning for six TOEFL® iBT reading passages (N?=?8,692), three involving physical science and three involving cultural topics. The majority of items displayed little or no DIF. When all of the items in a passage were examined, none of the passages showed differential functioning at the passage level. Hypotheses are provided for the DIF occurrences. Implications on fairness issues in test development are also discussed.  相似文献   

10.
The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure.  相似文献   

11.
Investigations of differential item functioning (DIF) have been conducted mostly on ability tests and have found little evidence of easily interpretable differences across various demographic subgroups. In this study, we examined the degree to which DIF in biographical data items referencing academically relevant background, experiences, and interests was related to differences in judgments about access to these experiences by members of different gender and race subgroups. DIF in the location parameter was significantly related (r = –.51, p < .01) to gender differences in perceived accessibility to experience. No significant relationships with accessibility were observed for DIF in the slope parameter across gender groups or for the slope and location parameters associated with DIF across Black and White groups. Practical implications for use of biodata and theoretical implications for DIF research are discussed.  相似文献   

12.
In this article we present a general approach not relying on item response theory models (non‐IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non‐IRT approach, NLR can be seen as a proxy of detection based on the three‐parameter IRT model which is a standard tool in the study field. Hence, NLR fills a logical gap in DIF detection methodology and as such is important for educational purposes. Moreover, the advantages of the NLR procedure as well as comparison to other commonly used methods are demonstrated in a simulation study. A real data analysis is offered to demonstrate practical use of the method.  相似文献   

13.
Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional behavior disorders) and students without disabilities. Multinomial logistic regression was employed to compare response characteristic curves (RCCs) of individual test items. Although no evidence for serious test bias was found for the state assessment examined in this study, the results indicated that students in different disability categories showed different patterns of DIF, DDF, and DOF, and that the use of RCCs helps clarify the implications of DIF and DDF.  相似文献   

14.
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N?=?20) and non-DIF (N?=?20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.  相似文献   

15.
Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.  相似文献   

16.
In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML estimation due to small samples and rare event data can degrade the performance of the LR procedure, especially when testing the DIF of difficult items in small samples. Penalized ML (PML) estimation was originally developed to reduce the finite-sample bias of conventional ML estimation and also was known to reduce the bias in the estimation of LR for the rare events data. The goal of this study is to compare the performances of the LR procedures based on the ML and PML estimation in terms of the statistical power and Type I error. In a simulation study, Swaminathan and Rogers's Wald test based on PML estimation (PSR) showed the highest statistical power in most of the simulation conditions, and LRT based on conventional PML estimation (PLRT) showed the most robust and stable Type I error. The discussion about the trade-off between bias and variance is presented in the discussion section.  相似文献   

17.
In a previous simulation study of methods for assessing differential item functioning (DIF) in computer-adaptive tests (Zwick, Thayer, & Wingersky, 1993, 1994), modified versions of the Mantel-Haenszel and standardization methods were found to perform well. In that study, data were generated using the 3-parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, the 3PL data were used but the Rasch model was assumed in obtaining the item parameter estimates, which determined the information table used for item selection. Although the obtained DIF statistics were highly correlated with the generating DIF values, they tended to be smaller in magnitude than in the 3PL analysis, resulting in a lower probability of DIF detection. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch-based computer-adaptive test tended to be biased downward, particularly for lower-ability examinees  相似文献   

18.
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF using a combination of the decisions made with BD and MH to those of LR. The results revealed that while the Type I error rate of CDR was consistently below the nominal alpha level, the Type I error rate of LR was high for the conditions having unequal ability distributions. In addition, the power of CDR was consistently higher than that of LR across all forms of DIF.  相似文献   

19.
Gender fairness in testing can be impeded by the presence of differential item functioning (DIF), which potentially causes test bias. In this study, the presence and causes of gender-related DIF were investigated with real data from 800 items answered by 250,000 test takers. DIF was examined using the Mantel–Haenszel and logistic regression procedures. Little DIF was found in the quantitative items and a moderate amount was found in the verbal items. Vocabulary items favored women if sampled from traditionally female domains but generally not vice versa if sampled from male domains. The sentence completion item format in the English reading comprehension subtest favored men regardless of content. The findings, if supported in a cross-validation study, can potentially lead to changes in how vocabulary items are sampled and in the use of the sentence completion format in English reading comprehension, thereby increasing gender fairness in the examined test.  相似文献   

20.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号