期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

Randall D. Penfield Karina Alvarez Okhee Lee 《教育实用测度》2013,26(1):61-78

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group differences in the measurement properties at each step underlying the polytomous response variable. The pattern of the DSF effects across the steps of the polytomous response variable can assume several different forms, and the different forms can have different implications for the sensitivity of DIF detection and the final interpretation of the causes of the DIF effect. In this article we propose a taxonomy of DSF forms, establish guidelines for using the form of DSF to help target and guide item content review and item revision, and provide procedural rules for using the frameworks of DSF and DIF in tandem to yield a comprehensive assessment of between-group measurement equivalence in polytomous items. 相似文献

2.

Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator

Randall D. Penfield 《Journal of Educational Measurement》2007,44(3):187-210

Many statistics used in the assessment of differential item functioning (DIF) in polytomous items yield a single item-level index of measurement invariance that collapses information across all response options of the polytomous item. Utilizing a single item-level index of DIF can, however, be misleading if the magnitude or direction of the DIF changes across the steps underlying the polytomous response process. A more comprehensive approach to examining measurement invariance in polytomous item formats is to examine invariance at the level of each step of the polytomous item, a framework described in this article as differential step functioning (DSF). This article proposes a nonparametric DSF estimator that is based on the Mantel-Haenszel common odds ratio estimator ( Mantel & Haenszel, 1959 ), which is frequently implemented in the detection of DIF in dichotomous items. A simulation study demonstrated that when the level of DSF varied in magnitude or sign across the steps underlying the polytomous response options, the DSF-based approach typically provided a more powerful and accurate test of measurement invariance than did corresponding item-level DIF estimators. 相似文献

3.

Distinguishing between Net and Global DIF in Polytomous Items

Randall D. Penfield 《Journal of Educational Measurement》2010,47(2):129-149

In this article, I address two competing conceptions of differential item functioning (DIF) in polytomously scored items. The first conception, referred to as net DIF, concerns between-group differences in the conditional expected value of the polytomous response variable. The second conception, referred to as global DIF, concerns the conditional dependence of group membership and the polytomous response variable. The distinction between net and global DIF is important because different DIF evaluation methods are appropriate for net and global DIF; no currently available method is universally the best for detecting both net and global DIF. Net and global DIF definitions are presented under two different, yet compatible, modeling frameworks: a traditional item response theory (IRT) framework, and a differential step functioning (DSF) framework. The theoretical relationship between the IRT and DSF frameworks is presented. Available methods for evaluating net and global DIF are described, and an applied example of net and global DIF is presented. 相似文献

4.

Aggregating Polytomous DIF Results Over Multiple Test Administrations

下载免费PDF全文

Rebecca Zwick Lei Ye Steven Isham 《Journal of Educational Measurement》2018,55(1):132-151

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the item's functioning is ignored. In earlier work, we developed the Bayesian updating (BU) DIF procedure for dichotomous items and showed how it could be used to formally aggregate DIF results over administrations. More recently, we extended the BU method to the case of polytomously scored items. We conducted an extensive simulation study that included four “administrations” of a test. For the single‐administration case, we compared the Bayesian approach to an existing polytomous‐DIF procedure. For the multiple‐administration case, we compared BU to two non‐Bayesian methods of aggregating the polytomous‐DIF results over administrations. We concluded that both the BU approach and a simple non‐Bayesian method show promise as methods of aggregating polytomous DIF results over administrations. 相似文献

5.

A Nested Logit Approach for Investigating Distractors as Causes of Differential Item Functioning

Youngsuk Suh Daniel M. Bolt 《Journal of Educational Measurement》2011,48(2):188-205

In multiple‐choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two‐step approach based on application of a nested logit model for this purpose. The approach separates testing of differential distractor functioning (DDF) from DIF, thus allowing for clearer evaluations of where distractors may be responsible for DIF. The approach is contrasted against competing methods and evaluated in simulation and real data analyses. 相似文献

6.

DIF Detection Using Multiple‐Group Categorical CFA With Minimum Free Baseline Approach

下载免费PDF全文

Yu‐Wei Chang Wei‐Kang Huang Rung‐Ching Tsai 《Journal of Educational Measurement》2015,52(2):181-199

The aim of this study is to assess the efficiency of using the multiple‐group categorical confirmatory factor analysis (MCCFA) and the robust chi‐square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all but the examined item are set to be DIF‐free, MCCFA with such a constrained baseline approach is commonly used in the literature. The present study relaxes this strong assumption and adopts the minimum free baseline approach where, aside from those parameters constrained for identification purpose, parameters of all but the examined item are allowed to differ among groups. Based on the simulation results, the robust chi‐square difference test statistic with the mean and variance adjustment is shown to be efficient in detecting DIF for polytomous items in terms of the empirical power and Type I error rates. To sum up, MCCFA under the minimum free baseline strategy is useful for DIF detection for polytomous items. 相似文献

7.

A Comparison of Adjacent Categories and Cumulative Differential Step Functioning Effect Estimators

Karina A. Gattamorta Randall D. Penfield 《教育实用测度》2013,26(2):142-161

The study of measurement invariance in polytomous items that targets individual score levels is known as differential step functioning (DSF). The analysis of DSF requires the creation of a set of dichotomizations of the item response variable. There are two primary approaches for creating the set of dichotomizations to conduct a DSF analysis: the adjacent categories approach, and the cumulative approach. To date, there is limited research on how these two approaches compare within the context of DSF, particularly as applied to a real data set. This study evaluated the results of a DSF analysis using both dichotomization schemes in order to determine if the two approaches yield similar results. The results revealed that the two approaches generally led to consistent results, particularly in the case where DSF effects were negligible. However, when significant DSF effects were present, the two approaches occasionally led to differing conclusions. 相似文献

8.

A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

Randall D. Penfield James Algina 《Journal of Educational Measurement》2006,43(4):295-312

One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the noniterative estimators developed by Camilli and Penfield (1997) for tests composed of dichotomous items. A small simulation study is reported in which the statistical properties of the generalized variance estimators are assessed, and guidelines are proposed for interpreting values of DIF effect variance estimators. 相似文献

9.

DIF分析实际应用中的常见问题及其研究新进展 总被引：1，自引：0，他引：1

李凌艳张勋《考试研究》2010,(2):73-82

多等级计分题、小样本、匹配变量不纯以及DIF检验后的原因分析是DIF检验面临的常见问题,对多等级计分题目进行DSF分析,小样本情况下DIF检测的平滑方法,匹配变量不纯情况下采用MIMIC法,以及运用Logistic模型进行DIF检验后的原因分析是DIF研究中的一些新进展。对这些进展的分析使我们相信,多种检验方法的配合使用、运用DIF研究进行多维IRT框架下的潜在变量探究等,都有可能使DIF研究成为测量学未来的基础研究领域之一。相似文献

10.

Logistic Regression and Its Use in Detecting Differential Item Functioning in Polytomous Items

Ann W. French Timothy R. Miller 《Journal of Educational Measurement》1996,33(3):315-332

A computer simulation study was conducted to determine the feasibility of using logistic regression procedures to detect differential item functioning (DIF) in polytomous items. One item in a simulated test of 25 items contained DIF; parameters' for that item were varied to create three conditions of nonuniform DIF and one of uniform DIF. Item scores were generated using a generalized partial credit model, and the data were recoded into multiple dichotomies in order to use logistic regression procedures. Results indicate that logistic regression is powerful in detecting most forms of DIF; however, it required large amounts of data manipulation, and interpretation of the results was sometimes difficult. Some logistic regression procedures may be useful in the post hoc analysis of DlF for polytomous items. 相似文献

11.

DIF Detection and Effect Size Measures for Polytomously Scored Items

Seock-Ho Kim Allan S. Cohen Cigdem Alagoz Sukwoo Kim 《Journal of Educational Measurement》2007,44(2):93-116

Data from a large-scale performance assessment ( N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood ratio test, the Mantel test, and the generalized Mantel–Haenszel test were compared. Results indicated some agreement among the five DIF detection methods. Because statistical power is a function of the sample size, the DIF detection results from extremely large data sets are not practically useful. As alternatives to the DIF detection methods, four IRT model-based indices of standardized impact and four observed-score indices of standardized impact for polytomous items were obtained and compared with the R ² measures of logistic regression. 相似文献

12.

Logistic Discriminant Function Analysis for DIF Identification of Polytomously Scored Items

Timothy R. Miller Judith A. Spray 《Journal of Educational Measurement》1993,30(2):107-122

The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure. 相似文献

13.

Using Log-Linear Smoothing to Improve Small-Sample DIF Estimation

Gautam Puhan Timothy P. Moses Lei Yu Neil J. Dorans 《Journal of Educational Measurement》2009,46(1):59-83

This study examined the extent to which log-linear smoothing could improve the accuracy of differential item functioning (DIF) estimates in small samples of examinees. Examinee responses from a certification test were analyzed using White examinees in the reference group and African American examinees in the focal group. Using a simulation approach, separate DIF estimates for seven small-sample-size conditions were obtained using unsmoothed (U) and smoothed (S) score distributions. These small sample U and S DIF estimates were compared to a criterion (i.e., DIF estimates obtained using the unsmoothed total data) to assess their degree of variability (random error) and accuracy (bias). Results indicate that for most studied items smoothing the raw score distributions reduced random error and bias of the DIF estimates, especially in the small-sample-size conditions. Implications of these results for operational testing programs are discussed. 相似文献

14.

Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model

Eiji Muraki 《Journal of Educational Measurement》1999,36(3):217-232

Bock, Muraki, and Pfeiffenberger (1988) proposed a dichotomous item response theory (IRT) model for the detection of differential item functioning (DIF), and they estimated the IRT parameters and the means and standard deviations of the multiple latent trait distributions. This IRT DIF detection method is extended to the partial credit model (Masters, 1982; Muraki, 1993) and presented as one of the multiple-group IRT models. Uniform and non-uniform DIF items and heterogeneous latent trait distributions were used to generate polytomous responses of multiple groups. The DIF method was applied to this simulated data using a stepwise procedure. The standardized DIF measures for slope and item location parameters successfully detected the non-uniform and uniform DIF items as well as recovered the means and standard deviations of the latent trait distributions.This stepwise DIF analysis based on the multiple-group partial credit model was then applied to the National Assessment of Educational Progress (NAEP) writing trend data. 相似文献

15.

Assessing Differential Item Functioning in Direct Writing Assessments: Problems and an Example

Catherine J. Welch Timothy R. Miller 《Journal of Educational Measurement》1995,32(2):163-178

The recent emphasis on various types of performance assessments raises questions concerning the differential effects of such assessments on population subgroups. Procedures for detecting differential item functioning (DIF) in data from performance assessments are available but may be hindered by problems that stem from this mode of assessment. Foremost among these are problems related to finding an appropriate matching variable. These problems are discussed and results are presented for three methods for DIF detection in polytomous items using data from a direct writing assessment. The purpose of the study is to examine the effects of using different combinations of internal and external matching variables. The procedures included a generalized Mantel-Haenszel statistic, a technique based on meta-analysis methodology, and logistic discriminant function analysis. In general, the results did not support the use of an external matching criterion and indicated that continued problems may be expected in attempts to assess DIF in performance assessments. 相似文献

16.

An Approach for Categorizing DIF in Polytomous Items

Randall D. Penfield 《教育实用测度》2013,26(3):335-355

A widely used approach for categorizing the level of differential item functioning (DIF) in dichotomous items is the scheme proposed by Educational Testing Service (ETS) based on a transformation of the Mantel-Haeszel common odds ratio. In this article two classification schemes for DIF in polytomous items (referred to as the P1 and P2 schemes) are proposed that parallel the criteria set forth in the ETS scheme for dichotomous items. The theoretical equivalence of the P1 and P2 schemes to the ETS scheme is described, and the results of a simulation study conducted to examine the empirical equivalence of the P1 and P2 schemes to the ETS scheme are presented. 相似文献

17.

Test fairness: Examining differential functioning of the reading comprehension section of the GSEEE in China

《Studies in Educational Evaluation》2020

This study investigated differential item functioning (DIF), differential bundle functioning (DBF), and differential test functioning (DTF) across gender of the reading comprehension section of the Graduate School Entrance English Exam in China. The datasets included 10,000 test-takers’ item-level responses to 6 five-item testlets. Both DIF and DBF were examined by using poly-simultaneous item bias test and item-response-theory-likelihood-ratio test, and DTF was investigated with multi-group confirmatory factor analyses (MG-CFA). The results indicated that although none of the 30 items exhibited statistically and practically significant DIF across gender at the item level, 2 testlets were consistently identified as having significant DBF at the testlet level by the two procedures. Nonetheless, DBF does not manifest itself at the overall test score level to produce DTF based on MG-CFA. This suggests that the relationship between item-level DIF and test-level DTF is a complicated issue with the mediating effect of testlets in testlet-based language assessment. 相似文献

18.

Possible Determinants of Differential Item Functioning: Familiarity, Interest, and Emotional Reaction

Lawrence J. Stricker Walter Emmerich 《Journal of Educational Measurement》1999,36(4):347-366

This study evaluated the connection between gender differences in examinees' familiarity, interest, and negative emotional reactions to items on the Advanced Placement Psychology Examination and the items' gender differential item functioning (DIF). Gender DIF and gender differences in interest varied appreciably with the content of the items. Gender differences in the three variables were substantially related to the items' gender DIF (e.g., R = .50). Much of the gender DIF on this test may be attributable to gender differences in these variables. 相似文献

19.

Development and Demonstration of Multidimensional IRT-Based Internal Measures of Differential Functioning of ltems and Tests

T. C. Oshima Nambury S. Raju Claudia P. Flowers 《Journal of Educational Measurement》1997,34(3):253-272

This article defines and demonstrates a framework for studying differential item functioning (DIF) and differential test functioning (DTF) for tests that are intended to be multidimensional The procedure introduced here is an extension of unidimensional differential functioning of items and tests (DFIT) recently developed by Raju, van der Linden, & Fleer (1995). To demonstrate the usefulness of these new indexes in a multidimensional IRT setting, two-dimensional data were simulated with known item parameters and known DIF and DTE The DIF and DTF indexes were recovered reasonably well under various distributional differences of Os after multidimensional linking was applied to put the two sets of item parameters on a common scale. Further studies are suggested in the area of DIF/DTF for intentionally multidimensional tests. 相似文献

20.

Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A Confirmatory Analysis

Mark J. Gierl Shameem Nyla Khaliq 《Journal of Educational Measurement》2001,38(2):164-187

Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests. 相似文献