期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Item Function Characteristics and Dimensionality for Alternative Response Formats in Mathematics

《教育实用测度》2013,26(3):257-275

The purpose of this study was to investigate the technical properties of stem-equivalent mathematics items differing only with respect to response format. Using socio- economic factors to define the strata, a proportional stratified random sample of 1,366 Connecticut sixth-grade students were administered one of three forms. Classical item analysis, dimensionality assessment, item response theory goodness-of-fit, and an item bias analysis were conducted. Analysis of variance and confirmatory factor analysis were used to examine the functioning of the items presented in the three different formats. It was found that, after equating forms, the constructed-response formats were somewhat more difficult than the multiple-choice format. However, there was no significant difference across formats with respect to item discrimination. A differential item functioning (DIF) analysis was conducted using both the Mantel-Haenszel procedure and the comparison of the item characteristic curves. The DIF analysis indicated that the presence of bias was not greatly affected by item format; that is, items biased in one format tended to be biased in a similar manner when presented in a different format, and unbiased items tended to remain so regardless of format. 相似文献

2.

Comparing student science performance between hands-on and traditional item types: A many-facet Rasch analysis

《Studies in Educational Evaluation》2021

This study aimed to compare student science performance between hands-on and traditional item types by investigating the item type effect and the interaction effect between item type and science content domain. In Shanghai, China, 2404 ninth-graders from six urban junior high schools took part in the study. The partial credit many-facet Rasch measurement analysis was used to examine the instrument's quality and investigate the item type effect and the interaction effect. The results showed that the traditional item type was significantly more difficult for participants than the hands-on item type, exhibiting a moderate-to-large effect size. Moderate or large interaction effects of an item type with a specific content domain on student science performance were also detected. Students performed better on some science content domains with a particular item type (either hands-on or traditional). Implications for assessment developers and science instructors were also discussed. 相似文献

3.

Validation of Cognitive Sensitivity for Item Response Curves

Kikumi K. Tatsuoka 《Journal of Educational Measurement》1987,24(3):233-245

This study sought a scientific way to examine whether item response curves are influenced systematically by the cognitive processes underlying solution of the items in a procedural domain (addition of fractions). Starting from an expert teacher's logical task analysis and prediction of various erroneous rules and sources of misconceptions, an error diagnostic program was developed. This program was used to carry out an error analysis of test performance by three samples of students. After the cognitive structure of the subtasks was validated by a majority of the students, the items were characterized by their underlying subtask patterns. It was found that item response curves for items in the same categories were significantly more homogeneous than those in different categories. In other words, underlying cognitive subtasks appeared to systematically influence the slopes and difficulties of item response curves. 相似文献

4.

The Effects of Dimensionality on Equating the Law School Admission Test

Gregory Camilli Ming-mei Wang Jacqueline Fesq 《Journal of Educational Measurement》1995,32(1):79-96

Using factor analysis, we conducted an assessment of multidimensionality for 6 forms of the Law School Admission Test (LSAT) and found 2 subgroups of items or factors for each of the 6 forms. The main conclusion of the factor analysis component of this study was that the LSAT appears to measure 2 different reasoning abilities: inductive and deductive. The technique of N. J. Dorans & N. M. Kingston (1985) was used to examine the effect of dimensionality on equating. We began by calibrating (with item response theory [IRT] methods) all items on a form to obtain Set I of estimated IRT item parameters. Next, the test was divided into 2 homogeneous subgroups of items, each having been determined to represent a different ability (i.e., inductive or deductive reasoning). The items within these subgroups were then recalibrated separately to obtain item parameter estimates, and then combined into Set II. The estimated item parameters and true-score equating tables for Sets I and II corresponded closely. 相似文献

5.

Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation

Anthony D. Albano Liuhan Cai Erin M. Lease Scott R. McConnell 《Journal of Educational Measurement》2019,56(2):437-451

Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20‐item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. 相似文献

6.

A Unified Latent Growth Curve Model

Chueh-An Hsieh Alexander von Eye Kimberly Maier Hsin-Jung Hsieh Shi-Hsiung Chen 《Structural equation modeling》2013,20(4):592-615

Applying item response theory models to repeated observations has demonstrated great promise in developmental research. By allowing the researcher to take account of the characteristics of both item response and measurement error in longitudinal trajectory analysis, it improves the reliability and validity of latent growth curve analysis. This has enabled the study, to differentially weigh individual items and examine developmental stability and change over time, to propose a comprehensive modeling framework, combining a measurement model with a structural model. Despite a large number of components requiring attention, this study focuses on model formulation, evaluates the performance of the estimators of model parameters, incorporates prior knowledge from Bayesian analysis, and applies the model using an illustrative example. It is hoped that this fundamental study can demonstrate the breadth of this unified latent growth curve model. 相似文献

7.

Instructionally Sensitive Psychometrics: Application of a New IRT-Based Detection Technique to Mathematics Achievement Test Items

Bengt O. Muthén Chih-Fen Kao Leigh Burstein 《Journal of Educational Measurement》1991,28(1):1-22

Achievement modeling is carried out in groups of students characterized by heterogeneous instructional background. Extensions of item response theory models incorporate variables reflecting different amounts of opportunity-to-leam (OTL). The effects of these OTL variables are studied with respect to their influence on both the latent trait and the item performance directly. Such direct effects may reflect instructionally sensitive items. U.S. eighth-grade mathematics data from the Second International Mathematics Study are analyzed. Here, the same test is taken by students enrolled in typical instruction and students enrolled in elementary algebra classes. It is shown that the new analysis provides a more detailed way to examine the influence of instruction on responses to test items than does conventional item response theory. 相似文献

8.

ESTIMATING THE RELIABILITY OF MULTIPLE TRUE-FALSE TESTS

DAVID A. FRISBIE CYNTHIA A. DRUVA 《Journal of Educational Measurement》1986,23(2):99-105

This study was designed to examine the level of dependence within multiple true-false (MTF) test item clusters by computing sets of item intercorrelations with data from a test composed of both MTF and multiple choice (MC) items. It was posited that internal analysis reliability estimates for MTF tests would be spurious due to elevated MTF within-cluster intercorrelations. Results showed that, on the average, MTF within-cluster dependence was no greater than that found between MTF items from different clusters, between MC items, or between MC and MTF items. But item for item, there was greater dependence between items within the same cluster than between items of different clusters. 相似文献

9.

高考改革更关注对教学的积极导向

HU Xiaoli 《中国考试》2008,(8)

考试可能有负面作用,也可以产生积极的影响,关键在于考试设计的指导思想。考试的形式和命题的取向主宰着教与学的模式。我国著名的考试问题研究专家桂诗春教授就曾指出,题型不变化,考试这把"刀"就"钝化"了,而要使之变得锋利,就要注意经常变换题型。新课程改革实验区高考英语试卷的变化,凸现了对学生分析、推理、概括和表达等较高层次能力的考查;题型的变化为广大教师深入领悟和实践高中英语新课程标准做出了正面引领的作用。相似文献

10.

Does Test Item Performance Increase with Test-to-Standards Alignment?

Anne Traynor 《Educational Assessment》2017,22(3):171-188

Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or “alignment,” and test performance is usually inferred from highly distal evidence, rather than directly examined. Utilizing mathematics standards content analysis data and achievement test item data from ten U.S. states, we examine the relationship between topic-specific alignment and test item performance. When a particular item’s content type is emphasized by the standards, we find evidence of a positive relationship between the alignment measure and proportion-correct test item difficulty, although this effect is not consistent across samples. Implications of the results for curricular achievement test development and score interpretation are discussed. 相似文献

11.

Consistency of Angoff-Based Predictions of Item Performance: Evidence of Technical Quality of Results From the Angoff Standard Setting Method

Barbara S. Plake James C. Impara Patrick M. Irwin 《Journal of Educational Measurement》2000,37(4):347-355

Judgmental standard-setting methods, such as the Angoff(1971) method, use item performance estimates as the basis for determining the minimum passing score (MPS). Therefore, the accuracy, of these item peformance estimates is crucial to the validity of the resulting MPS. Recent researchers (Shepard, 1995; Impara & Plake, 1998; National Research Council. 1999) have called into question the ability of judges to make accurate item performance estimates for target subgroups of candidates, such as minimally competent candidates. The propose of this study was to examine the intra- and inter-rater consistency of item performance estimates from an Angoff standard setting. Results provide evidence that item pelformance estimates were consistent within and across panels within and across years. Factors that might have influenced this high degree of reliability, in the item performance estimates in a standard setting study are discussed. 相似文献

12.

Differential Performance by English Language Learners on an Inquiry-Based Science Assessment

Sultan Turkan 《International Journal of Science Education》2013,35(15):2343-2369

The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs. 相似文献

13.

Examining the Validity of GOLD® With 4-Year-Old Dual Language Learners

Do-Hong Kim Richard G. Lambert Sean Durham Diane C. Burts 《Early education and development》2018,29(4):477-493

Research Findings: This study builds on prior work related to the assessment of young dual language learners (DLLs). The purposes of the study were to (a) determine whether latent subgroups of preschool DLLs would replicate those found previously and (b) examine the validity of GOLD^® by Teaching Strategies with empirically derived subgroups. Latent class analysis confirmed previous findings of 3 distinct latent subgroups of DLLs (bilingual children, emergent bilingual children, and heritage language speakers). Results of differential item functioning analysis showed that with few exceptions, GOLD items functioned similarly, which indicates that groups matched on ability were similar in their item scores. The item pertaining to using conventional grammar consistently favored non-DLLs over heritage language speakers. The item pertaining to name writing consistently favored DLLs as a single group, emergent bilingual children, and heritage language speakers. Practice or Policy: Study results provide further support for the heterogeneity of DLLs and the use of GOLD with DLL subgroups. This provides the field with an opportunity to better understand this special population of children and enables teachers to plan with greater precision experiences that contribute to their development and learning. 相似文献

14.

Multiple-Choice Models: The Distractors Are Also Part of the Item

David Thissen Lynne Steinberg Anne R. Fitzpatrick 《Journal of Educational Measurement》1989,26(2):161-176

This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed. 相似文献

15.

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Sun‐Joo Cho Youngsuk Suh Woo‐yeol Lee 《Educational Measurement》2016,35(1):48-61

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data. 相似文献

16.

Curricular Differences and Unidimensionality of Achievement Test Data: An Exploratory Analysis

S. E. Phillips William A. Mehrens 《Journal of Educational Measurement》1987,24(1):1-16

The purpose of this study was to investigate whether a linear factor analytic method commonly used to investigate violation of the item response theory (IRT) unidimensionality assumption is sensitive to measurable curricular differences within a school district and to examine the possibility of differential item performance for groups of students receiving different instruction. For grades 3 and 6 in reading and mathematics, personnel from two midwestern school systems that regularly administer standardized achievement tests identified the formal textbook series used and provided ratings of test-instructional match for each school building (classroom). For both districts, the factor analysis results suggested no differences in percentages of variance for large first factors and relatively small second factors across ratings or series groups. The IRT analyses indicated little, if any, differential item performance for curricular subgroups. Thus, the impact of factors that might be related to curricular differences was judged to be minor. 相似文献

17.

对大学生科研训练的实践与思考 总被引：8，自引：8，他引：8

杨宏伟《实验技术与管理》2006,23(1):15-16

介绍通过具体的科研训练，对学生动手能力、项目分析和查阅资料等多种能力的培养过程，在大学生中建立诚信体系和科学道德规范的实践过程。同时，实践表明，大学生科研训练计划是检验教师教学与学生理论联系实际的一种值得推广的好方法。相似文献

18.

中央民族大学预科部考核方式解决方案

黄宁《民族教育研究》2005,16(3):10-15

考试是检验教与学效果的重要手段,试题库是试卷的基础,试卷分析法是检验试卷合理性与详细分析考试结果的方法。建立试题库及从中抽取试题时应遵循不重复、不遗漏、均衡分配得分、题型多样等原则;抽取试题方式要注意题型控制、章节控制;试卷分析法的三种图表对了解学生和改进教学有很大的帮助。相似文献

19.

Sources of difficulty in assessment: example of PISA science items

Florence Le Hebel Pascale Montpied Andrée Tiberghien Valérie Fontanieu 《International Journal of Science Education》2013,35(4):468-487

ABSTRACT

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item’s proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item’s proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students’ low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective. 相似文献

20.

Assessing risk of commercial sexual exploitation among children involved in the child welfare system

《Child abuse & neglect》2019

The objective of this study was to assess item characteristics indicative of the severity of risk for commercial sexual exploitation among a high-risk population of child welfare system involved youth to inform the construction of a screening tool. Existing studies have discerned factors that differentiate Commercial Sexual Exploitation of Children (CSEC) victims from sexual abuse victims, yet no research has been conducted to discriminate which items in a high risk population of youth are most predictive of CSEC. Using the National Survey of Child and Adolescent Well-Being (NSCAW) cohorts I and II, we examined responses from 1063 males and 1355 females ages 11 and older, over three interview periods.A 2-parameter logistic Item Response Theory (2 PL IRT) model was employed in order to examine item performance as potential indicators for the severity of risk for CSEC. Differential Item Functioning (DIF) analysis was conducted in order to examine potential differences in item responses based on gender. Modeling strategies to assess item difficulty and discrimination were outlined and Item Characteristic Curves for the final retained items were presented. Evidence for uniform DIF were present within items that asked about runaway, any drug use, suicidality, and experiencing severe violence. Results from this study can inform the construction of a screening instrument to assess the severity of risk for experiencing CSEC. 相似文献