期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An NCME Instructional Module on Using Differential Step Functioning to Refine the Analysis of DIF in Polytomous Items

Randall D. Penfield Karina Gattamorta Ruth A. Childs 《Educational Measurement》2009,28(1):38-49

Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item‐level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has recently been proposed, whereby measurement invariance is examined within each step underlying the polytomous response variable. The examination of DSF can provide valuable information concerning the nature of the DIF effect (i.e., is the DIF an item‐level effect or an effect isolated to specific score levels), the location of the DIF effect (i.e., precisely which score levels are manifesting the DIF effect), and the potential causes of a DIF effect (i.e., what properties of the item stem or task are potentially biasing). This article presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items. An example with real testing data is presented to illustrate the comprehensive information provided by a DSF analysis. 相似文献

2.

A Nested Logit Approach for Investigating Distractors as Causes of Differential Item Functioning

Youngsuk Suh Daniel M. Bolt 《Journal of Educational Measurement》2011,48(2):188-205

In multiple‐choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two‐step approach based on application of a nested logit model for this purpose. The approach separates testing of differential distractor functioning (DDF) from DIF, thus allowing for clearer evaluations of where distractors may be responsible for DIF. The approach is contrasted against competing methods and evaluated in simulation and real data analyses. 相似文献

3.

Technique for Evaluating Arithmetic Courses of Study

Philip Jaffe 《Journal of Experimental Education》2013,81(4):351-355

A graphic procedure for studying differential item functioning (DIF) designed to provide diagnostic information for psychometricians and educators is presented. Items from a certifying examination in a subspecialty of internal medicine are used as examples. The procedure provides a “signature” of each test item that may be used in conjunction with a summary statistic to flag items showing DIF. Advantages and limitations of the procedure are noted, as are additional areas for investigation. 相似文献

4.

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Sun‐Joo Cho Youngsuk Suh Woo‐yeol Lee 《Educational Measurement》2016,35(1):48-61

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data. 相似文献

5.

Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

Cuellar Edwin Partchev Ivailo Zwitser Robert Bechger Timo 《Educational Assessment, Evaluation and Accountability》2021,33(1):9-25

International large-scale assessment in education aims to compare educational achievement across many countries. Differences between countries in language, culture, and education give rise to differential item functioning (DIF). For many decades, DIF has been regarded as a nuisance and a threat to validity. In this paper, we take a different stance and argue that DIF holds essential information about the differences between countries. To uncover this information, we explore the use of multivariate analysis techniques as ways to analyze DIF emphasizing visualization. PISA 2012 data are used for illustration.

相似文献

6.

Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator

Randall D. Penfield 《Journal of Educational Measurement》2007,44(3):187-210

Many statistics used in the assessment of differential item functioning (DIF) in polytomous items yield a single item-level index of measurement invariance that collapses information across all response options of the polytomous item. Utilizing a single item-level index of DIF can, however, be misleading if the magnitude or direction of the DIF changes across the steps underlying the polytomous response process. A more comprehensive approach to examining measurement invariance in polytomous item formats is to examine invariance at the level of each step of the polytomous item, a framework described in this article as differential step functioning (DSF). This article proposes a nonparametric DSF estimator that is based on the Mantel-Haenszel common odds ratio estimator ( Mantel & Haenszel, 1959 ), which is frequently implemented in the detection of DIF in dichotomous items. A simulation study demonstrated that when the level of DSF varied in magnitude or sign across the steps underlying the polytomous response options, the DSF-based approach typically provided a more powerful and accurate test of measurement invariance than did corresponding item-level DIF estimators. 相似文献

7.

Detection of Gender-Related Differential Item Functioning in a Mathematics Performance Assessment

《教育实用测度》2013,26(2):175-199

This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form. 相似文献

8.

An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis

Rebecca Zwick Dorothy T. Thayer Charles Lewis 《Journal of Educational Measurement》1999,36(1):1-28

We developed an empirical Bayes (EB) enhancement to Mantel-Haenszel (MH) DIF analysis in which we assume that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. We use the posterior distribution of DIF parameters to make inferences about the item's true DIF status and the posterior predictive distribution to predict the item's future observed status. DIF status is expressed in terms of the probabilities associated with each of the five DIF levels defined by the ETS classification system: C–, B–, A, B+, and C+. The EB methods yield more stable DIF estimates than do conventional methods, especially in small samples, which is advantageous in computer-adaptive testing. The EB approach may also convey information about DIF stability in a more useful way by representing the state of knowledge about an item's DIF status as probabilistic. 相似文献

9.

Analyzing Fairness Among Linguistic Minority Populations Using a Latent Class Differential Item Functioning Approach

Maria Elena Oliveri Kadriye Ercikan Juliette Lyons-Thomas Steven Holtzman 《教育实用测度》2013,26(1):17-29

ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations. 相似文献

10.

Aggregating Polytomous DIF Results Over Multiple Test Administrations

下载免费PDF全文

Rebecca Zwick Lei Ye Steven Isham 《Journal of Educational Measurement》2018,55(1):132-151

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the item's functioning is ignored. In earlier work, we developed the Bayesian updating (BU) DIF procedure for dichotomous items and showed how it could be used to formally aggregate DIF results over administrations. More recently, we extended the BU method to the case of polytomously scored items. We conducted an extensive simulation study that included four “administrations” of a test. For the single‐administration case, we compared the Bayesian approach to an existing polytomous‐DIF procedure. For the multiple‐administration case, we compared BU to two non‐Bayesian methods of aggregating the polytomous‐DIF results over administrations. We concluded that both the BU approach and a simple non‐Bayesian method show promise as methods of aggregating polytomous DIF results over administrations. 相似文献

11.

Testing Features of Graphical DIF: Application of a Regression Correction to Three Nonparametric Statistical Tests

Daniel M. Bolt Mark J. Gierl 《Journal of Educational Measurement》2006,43(4):313-333

Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present. Traditional nonparametric tests of DIF (Mantel-Haenszel, SIBTEST) are not designed to test for the presence of nonuniform or local DIF, while common probability difference (P-DIF) tests (e.g., SIBTEST) do not optimize power in testing for uniform DIF, and thus may be less useful in the context of graphical DIF analyses. In this article, modifications of three alternative nonparametric statistical tests for DIF, Fisher's χ ² test, Cochran's Z test, and Goodman's U test ( Marascuilo & Slaughter, 1981 ), are investigated for these purposes. A simulation study demonstrates the effectiveness of a regression correction procedure in improving the statistical performance of the tests when using an internal test score as the matching criterion. Simulation power and real data analyses demonstrate the unique information provided by these alternative methods compared to SIBTEST and Mantel-Haenszel in confirming various forms of DIF in translated tests. 相似文献

12.

DIF Detection and Interpretation in Large-Scale Science Assessments: Informing Item Writing Practices

April L. Zenisky Ronald K. Hambleton Frederic Robin 《Educational Assessment》2013,18(1-2):61-78

Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF. 相似文献

13.

二级计分数据DIF模拟研究的数据产生原理及其软件实现

朱乙艺焦丽亚《考试研究》2012,(6):80-87,19

和基于实测数据的DIF研究相比,基于模拟数据的DIF研究不仅可以自由操纵实验条件,而且可以给出检验力和I型错误指标。本文详细阐述了二级计分DIF模拟数据的产生原理,其产生过程包括四个阶段：选择DIF产生思路,选择项目反应理论模型,确定考生特征、题目特征和复本数,计算考生在题目上的正确作答概率并转化为二级计分数据。并且分别利用常用软件Excel和专业软件WinGen3展示了二级计分DIF模拟数据的产生过程。相似文献

14.

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Likun Hou Jimmy de la Torre Ratna Nandakumar 《Journal of Educational Measurement》2014,51(1):98-125

Analyzing examinees’ responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study explored the effectiveness of the Wald test in detecting both uniform and nonuniform DIF in the DINA model through a simulation study. Results of this study suggest that for relatively discriminating items, the Wald test had Type I error rates close to the nominal level. Moreover, its viability was underscored by the medium to high power rates for most investigated DIF types when DIF size was large. Furthermore, the performance of the Wald test in detecting uniform DIF was compared to that of the traditional Mantel‐Haenszel (MH) and SIBTEST procedures. The results of the comparison study showed that the Wald test was comparable to or outperformed the MH and SIBTEST procedures. Finally, the strengths and limitations of the proposed method and suggestions for future studies are discussed. 相似文献

15.

Modeling Instructional Sensitivity Using a Longitudinal Multilevel Differential Item Functioning Approach

Alexander Naumann Jan Hochweber Johannes Hartig 《Journal of Educational Measurement》2014,51(4):381-399

Students’ performance in assessments is commonly attributed to more or less effective teaching. This implies that students’ responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a longitudinal multilevel‐differential item functioning (DIF) model to combine two existing yet independent approaches to evaluate items’ instructional sensitivity. The model permits for a more informative judgment of instructional sensitivity, allowing the distinction of global and differential sensitivity. Exemplarily, the model is applied to two empirical data sets, with classical indices (Pretest–Posttest Difference Index and posttest multilevel‐DIF) computed for comparison. Results suggest that the approach works well in the application to empirical data, and may provide important information to test developers. 相似文献

16.

Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

Randall D. Penfield Karina Alvarez Okhee Lee 《教育实用测度》2013,26(1):61-78

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group differences in the measurement properties at each step underlying the polytomous response variable. The pattern of the DSF effects across the steps of the polytomous response variable can assume several different forms, and the different forms can have different implications for the sensitivity of DIF detection and the final interpretation of the causes of the DIF effect. In this article we propose a taxonomy of DSF forms, establish guidelines for using the form of DSF to help target and guide item content review and item revision, and provide procedural rules for using the frameworks of DSF and DIF in tandem to yield a comprehensive assessment of between-group measurement equivalence in polytomous items. 相似文献

17.

DIF Detection Using Multiple‐Group Categorical CFA With Minimum Free Baseline Approach

下载免费PDF全文

Yu‐Wei Chang Wei‐Kang Huang Rung‐Ching Tsai 《Journal of Educational Measurement》2015,52(2):181-199

The aim of this study is to assess the efficiency of using the multiple‐group categorical confirmatory factor analysis (MCCFA) and the robust chi‐square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all but the examined item are set to be DIF‐free, MCCFA with such a constrained baseline approach is commonly used in the literature. The present study relaxes this strong assumption and adopts the minimum free baseline approach where, aside from those parameters constrained for identification purpose, parameters of all but the examined item are allowed to differ among groups. Based on the simulation results, the robust chi‐square difference test statistic with the mean and variance adjustment is shown to be efficient in detecting DIF for polytomous items in terms of the empirical power and Type I error rates. To sum up, MCCFA under the minimum free baseline strategy is useful for DIF detection for polytomous items. 相似文献

18.

Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions

Sarah Quesen Suzanne Lane 《教育实用测度》2013,26(4):337-349

ABSTRACT

This study examined the effect of similar vs. dissimilar proficiency distributions on uniform DIF detection on a statewide eighth grade mathematics assessment. Results from the similar- and dissimilar-ability reference groups with an SWD focal group were compared for four models: logistic regression, hierarchical generalized linear model (HGLM), the Wald-1 IRT-based test, and the Mantel-Haenszel procedure. A DIF-free-then-DIF strategy was used. The rate of DIF detection was examined among all accommodated scores and common accommodation subcategories. No items were detected for DIF using the similar ability distribution reference group, regardless of method. With the dissimilar ability reference group, logistic regression and Mantel–Haenszel flagged 8–17%, and the Wald-1 and HGLM test flagged 23–38% of items for DIF. Forming focal groups by accommodation type did not alter the pattern of DIF detection. Creating a reference group to be similar in ability to the focal group may control the rate of erroneous DIF detection for SWD. 相似文献

19.

Effect of Rasch Calibration on Ability and DIF Estimation in Computer-Adaptive Tests

Rebecca Zwick Dorothy T. Thayer Marilyn Wingersky 《Journal of Educational Measurement》1995,32(4):341-363

In a previous simulation study of methods for assessing differential item functioning (DIF) in computer-adaptive tests (Zwick, Thayer, & Wingersky, 1993, 1994), modified versions of the Mantel-Haenszel and standardization methods were found to perform well. In that study, data were generated using the 3-parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, the 3PL data were used but the Rasch model was assumed in obtaining the item parameter estimates, which determined the information table used for item selection. Although the obtained DIF statistics were highly correlated with the generating DIF values, they tended to be smaller in magnitude than in the 3PL analysis, resulting in a lower probability of DIF detection. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch-based computer-adaptive test tended to be biased downward, particularly for lower-ability examinees 相似文献

20.

For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

Raman K. Grover Kadriye Ercikan 《教育实用测度》2017,30(3):178-195

In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree, indicating existence of heterogeneity of response patterns within gender groups. In this article the impact of heterogeneity within gender groups on DIF investigations was investigated. Specifically, it was examined whether DIF results varied when comparing males versus females, gender × socioeconomic status subgroups and latent classes of gender. DIF analyses were conducted on reading achievement data from the Canadian sample of the Programme of International Student Assessment 2009. Results indicated considerable heterogeneity within males and females and DIF results were found to vary when heterogeneity was taken into account versus when it was not. 相似文献