首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 385 毫秒
1.
Gender fairness in testing can be impeded by the presence of differential item functioning (DIF), which potentially causes test bias. In this study, the presence and causes of gender-related DIF were investigated with real data from 800 items answered by 250,000 test takers. DIF was examined using the Mantel–Haenszel and logistic regression procedures. Little DIF was found in the quantitative items and a moderate amount was found in the verbal items. Vocabulary items favored women if sampled from traditionally female domains but generally not vice versa if sampled from male domains. The sentence completion item format in the English reading comprehension subtest favored men regardless of content. The findings, if supported in a cross-validation study, can potentially lead to changes in how vocabulary items are sampled and in the use of the sentence completion format in English reading comprehension, thereby increasing gender fairness in the examined test.  相似文献   

2.
Investigations of differential item functioning (DIF) have been conducted mostly on ability tests and have found little evidence of easily interpretable differences across various demographic subgroups. In this study, we examined the degree to which DIF in biographical data items referencing academically relevant background, experiences, and interests was related to differences in judgments about access to these experiences by members of different gender and race subgroups. DIF in the location parameter was significantly related (r = –.51, p < .01) to gender differences in perceived accessibility to experience. No significant relationships with accessibility were observed for DIF in the slope parameter across gender groups or for the slope and location parameters associated with DIF across Black and White groups. Practical implications for use of biodata and theoretical implications for DIF research are discussed.  相似文献   

3.
The standardization approach to assessing differential item functioning (DIF), including standardized distractor analysis, is described. The results of studies conducted on Asian Americans, Hispanics (Mexican Americans and Puerto Ricans), and Blacks on the Scholastic Aptitude Test (SAT) are described and then synthesized across studies. Where the groups were limited to include only examinees who spoke English as their best language, very few items across forms and ethnic groups exhibited large DIF. Major findings include evidence of differential speededness (where minority examinees did not complete SAT-Verbal sections at the same rate as White students with comparable SAT-Verbal scores) for Blacks and Hispanics and, when the item content is of special interest, advantages for the relevant ethnic group. In addition, homographs tend to disadvantage all three ethnic groups, but the effect of vertical relationships in analogy items are not as consistent. Although these findings are important in understanding DIF, they do not seem to account for all differences. Other variables related to DIF still need to be identified. Furthermore, these findings are seen as tentative until corroborated by studies using controlled data collection designs.  相似文献   

4.
What two major approaches have been used to study gender bias in test scores? How do statistical DIF detection methods differ? How does DIF screening of items affect mean score differences?  相似文献   

5.
《教育实用测度》2013,26(2):175-199
This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form.  相似文献   

6.
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland’s hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization tests of hypothesized order relations). Results suggest that eliminating items showing gender-specific DIF had no considerable influence on the instrument’s psychometric structure. Considering DIF as one possibility to improve test fairness when developing interest inventories is discussed.  相似文献   

7.
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups.  相似文献   

8.
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF.  相似文献   

9.
Progress has been made in developing statistical methods for identifying DIF items, but procedures to aid with the substantive interpretations of these items have lagged behind. To overcome this problem, Roussos and Stout (1996) proposed a multidimensionality-based DIF analysis paradigm. We illustrate and evaluate an application of this framework as it applied to the study of gender differences in mathematics. Four characteristics distinguish this study from previous research: the substantive analysis was guided by past research on the content and cognitive-related sources of gender differences in mathematics achievement, as presented in the taxonomy by Gallagher, De Lisi, Holst, McGillicuddy-De Lisi, Morely, and Cahalan (2000); the substantive analysis was conducted by reviewers who were highly knowledgeable about the cognitive strategies students use to solve math problems; three statistical methods were used to test hypotheses about gender differences, including SIBTEST, DIMTEST, and multiple linear regression; and the data were from a curriculum-based achievement test developed with the goal of minimizing obvious, content-related gender differences. We show that the framework can lead to clearly interpretable results and we highlight both the strengths and weaknesses of applying the Roussos and Stout framework to the study of group differences.  相似文献   

10.
This paper considers a modification of the DIF procedure SIBTEST for investigating the causes of differential item functioning (DIF). One way in which factors believed to be responsible for DIF can be investigated is by systematically manipulating them across multiple versions of an item using a randomized DIF study (Schmitt, Holland, & Dorans, 1993). In this paper: it is shown that the additivity of the index used for testing DIF in SIBTEST motivates a new extension of the method for statistically testing the effects of DIF factors. Because an important consideration is whether or not a studied DIF factor is consistent in its effects across items, a methodology for testing item x factor interactions is also presented. Using data from the mathematical sections of the Scholastic Assessment Test (SAT), the effects of two potential DIF factors—item format (multiple-choice versus open-ended) and problem type (abstract versus concrete)—are investigated for gender Results suggest a small but statistically significant and consistent effect of item format (favoring males for multiple-choice items) across items, and a larger but less consistent effect due to problem type.  相似文献   

11.
In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree, indicating existence of heterogeneity of response patterns within gender groups. In this article the impact of heterogeneity within gender groups on DIF investigations was investigated. Specifically, it was examined whether DIF results varied when comparing males versus females, gender × socioeconomic status subgroups and latent classes of gender. DIF analyses were conducted on reading achievement data from the Canadian sample of the Programme of International Student Assessment 2009. Results indicated considerable heterogeneity within males and females and DIF results were found to vary when heterogeneity was taken into account versus when it was not.  相似文献   

12.
This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did the simultaneous item bias procedure—particularly multiple-choice items. For both reading and mathematics tests, multiple-choice items generally favored males while constructed-response items generally favored females. Content analyses showed that flagged reading items typically measured text interpretations or implied meanings; males tended to benefit from items that asked them to identify reasonable interpretations and analyses of informational text. Most items that favored females asked students to make their own interpretations and analyses, of both literary and informational text, supported by text-based evidence. Content analysis of mathematics items showed that items favoring males measured geometry, probability, and algebra. Mathematics items favoring females measured statistical interpretations, multistep problem solving, and mathematical reasoning.  相似文献   

13.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

14.
Identifying the Causes of DIF in Translated Verbal Items   总被引:1,自引:0,他引:1  
Translated tests are being used increasingly for assessing the knowledge and skills of individuals who speak different languages. There is little research exploring why translated items sometimes function differently across languages. If the sources of differential item functioning (DIF) across languages could be predicted, it could have important implications on test development, scoring and equating. This study focuses on two questions: “Is DIF related to item type?”, “What are the causes of DIF?” The data were taken from the Israeli Psychometric Entrance Test in Hebrew (source) and Russian (translated). The results indicated that 34% of the items functioned differentially across languages. The analogy items were the most problematic with 65% showing DIF, mostly in favor of the Russian-speaking examinees. The sentence completion items were also a problem (45% D1F). The main reasons for DIF were changes in word difficulty, changes in item format, differences in cultural relevance, and changes in content.  相似文献   

15.
This study examined the effect of sample size ratio and model misfit on the Type I error rates and power of the Difficulty Parameter Differences procedure using Winsteps. A unidimensional 30-item test with responses from 130,000 examinees was simulated and four independent variables were manipulated: sample size ratio (20/100/250/500/1000); model fit/misfit (1 PL and 3PLc =. 15 models); impact (no difference/mean differences/variance differences/mean and variance differences); and percentage of items with uniform and nonuniform DIF (0%/10%/20%). In general, the results indicate the importance of ensuring model fit to achieve greater control of Type I error and adequate statistical power. The manipulated variables produced inflated Type I error rates, which were well controlled when a measure of DIF magnitude was applied. Sample size ratio also had an effect on the power of the procedure. The paper discusses the practical implications of these results.  相似文献   

16.
Gender effects in large-scale assessments have become an increasingly important research area within and across countries. Yet few studies have linked differences in assessment results of male and female students in higher education to construct-relevant features of the target construct. This paper examines gender effects on students’ economic content knowledge with a focus on construct-relevant explanations. Moreover, we compare gender effects cross-nationally between Germany, Japan, and the United States. To assess economic content knowledge of higher education students, we used translated, adapted, and validated versions of the Test of Understanding in College Economics (TUCE, 4th ed.), an instrument that is commonly used internationally. We found gender effects on test scores in all three countries; effects were larger in Germany and the United States than in Japan. Gender effects were generally more pronounced on the numeracy subscale than on the literacy subscale, that is, male students had a greater edge over female students when items required calculations. In our conclusion, we discuss how numeracy and literacy items may tap different abilities.  相似文献   

17.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

18.
Differential item functioning (DIF) may be caused by an interaction of multiple manifest grouping variables or unexplored manifest variables, which cannot be detected by conventional DIF detection methods that are based on a single manifest grouping variable. Such DIF may be detected by a latent approach using the mixture item response theory model and subsequently explained by multiple manifest variables. This study facilitates the interpretation of latent DIF with the use of background and cognitive variables. The PISA 2009 reading assessment and student survey are analyzed. Results show that members in manifest groups were not homogenously advantaged or disadvantaged and that a single manifest grouping variable did not suffice to be a proxy of latent DIF. This study also demonstrates that DIF items arising from the interaction of multiple variables can be effectively screened by the latent DIF analysis approach. Background and cognitive variables jointly well predicted latent class membership.  相似文献   

19.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

20.
This article examines nonmathematical linguistic complexity as a source of differential item functioning (DIF) in math word problems for English language learners (ELLs). Specifically, this study investigates the relationship between item measures of linguistic complexity, nonlinguistic forms of representation and DIF measures based on item response theory difficulty parameters in a state fourth-grade math test. This study revealed that the greater the item nonmathematical lexical and syntactic complexity, the greater are the differences in difficulty parameter estimates favoring non-ELLs over ELLs. However, the impact of linguistic complexity on DIF is attenuated when items provide nonlinguistic schematic representations that help ELLs make meaning of the text, suggesting that their inclusion could help mitigate the negative effect of increased linguistic complexity in math word problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号