期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

题库建设中的试题沉积问题及其应对策略分析 总被引：4，自引：2，他引：2

高升《中国考试》2010,(3)

试题沉积是题库建设必须面对和解决的问题。造成这种现象的原因多种多样,如试题入库把关不严、难度不合组卷要求、新旧题型发生变化、试题已经老化过时以及库存试题分布不均,等等。针对这些原因,本文逐一提出解决办法,如强化教师培训、严格入库标准、题型翻新改造、控制试题难度以及提高命题工作的前瞻性,等等。通过这些措施,努力减少试题沉积,提高入库试题使用率,实现题库建设的又好又快发展。相似文献

2.

Multilevel Modeling of Item Position Effects

Anthony D. Albano 《Journal of Educational Measurement》2013,50(4):408-426

In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. A hierarchical generalized linear model is formulated for estimating item‐position effects. The model is demonstrated using data from a pilot administration of the GRE wherein the same items appeared in different positions across the test form. Methods for detecting and assessing position effects are discussed, as are applications of the model in the contexts of test development and item analysis. 相似文献

3.

阅读理解多项选择题项目分析

李雪《牡丹江教育学院学报》2010,(6):159-161

为保证语言测试题目的质量和加强题库建设,本文基于经典测试理论,使用Gitest Ⅲ对一份高考试卷（阅读部分）题目进行项目分析,结果显示：该阅读题目的难度、区分度较理想,但难度分布并不理想。建议在使用题库中的组合试卷前先进行试测,以改进试题的难度分布以及部分题目选项的质量,从而提高试题的信度和效度。相似文献

4.

Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics 总被引：1，自引：0，他引：1

Janice Dowd Scheuneman Kalle Gerritz 《Journal of Educational Measurement》1990,27(2):109-131

Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed. 相似文献

5.

论高等教育自学考试试题难度的控制 总被引：1，自引：0，他引：1

王晓华《中国考试》2008,(1):48-54

高等教育自学考试试题难度的控制是命题工作中的主要内容和核心问题之一。本文论述了自学考试试题难度控制的意义、要求、基本措施以及试题难度调控的基本方法。特别指出自学考试大纲中对试题难度层次分值比例规定的不尽合理之处.并提出了相应的调整建议。分析了不同命题质量控制措施对试题难度控制的不同作用。最后,结合实例论述了试题难度调控的基本技术方法。相似文献

6.

试题命制的理论和技术(二) 总被引：1，自引：0，他引：1

雷新勇周群《考试研究》2008,(2):90-106

大规模教育考试试题命制以心理学的某些理论假设为基础。与这些理论假设一致的试题定义要求试题应该具备三个要素,即测量目标、刺激情境和设问,这三个要素缺失了任何一个,都不能构成完整的试题。根据这些理论假设以及试题定义和要素,本文讨论了命制客观题和主观题的基本要求,客观题包括题干的要求、选项设置的要求以及选项数的问题;主观题包括情境材料的选择、设问、赋分和评分标准制定。相似文献

7.

Minimizing the Influence of Item Parameter Estimation Errors in Test Development: A Comparison of Three Selection Procedures

Mark J. Gierl Dianne Henderson Michael Jodoin Don Klinger 《Journal of Experimental Education》2013,81(3):261-279

In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed. 相似文献

8.

The Stability of IRT b Values

Robert C. Sykes Anne R. Fitzpatrick 《Journal of Educational Measurement》1992,29(3):201-211

This study investigated possible explanations for an observed change in Rasch item parameters (b values) obtained from consecutive administrations of a professional licensure examination. Considered in this investigation were variables related to item position, item type, item content, and elapsed time between administrations of the item. An analysis of covariance methodology was used to assess the relations between these variables and change in item b values, with the elapsed time index serving to control for differences that could be attributed to average or pool changes in b values over time. A series of analysis of covariance models were fitted to the data in an attempt to identify item characteristics that were significantly related to the change in b values after the time elapsed between item administrations had been controlled. The findings indicated that the change in item b values was not related either to item position or to item type. A small, positive relationship between this change and elapsed time indicated that the pool b values were increasing over time. A test of simple effects suggested the presence of greater change for one of the content categories analyzed. These findings are interpreted, and suggestions for future research are provided. 相似文献

9.

Identifying Potential Test Item Misalignment Using Student Verbal Reports

Jacqueline P. Leighton Ph.D. R. Psych Rebecca J. Gokiert 《Educational Assessment》2013,18(4):215-242

The purpose of the present investigation was to identify the relationship among different indicators of uncertainty that lead to potential item misalignment. The item-based indicators included ratings of ambiguity and cognitive complexity. The student-based indicators included (a) frequency of cognitive monitoring per item, (b) levels of misinterpretation per item, and (c) levels of lack of confidence per item. Results indicate that item cognitive complexity was related to all student-based indicators even after controlling for students' performance on the item. Moreover, item ambiguity was related to levels of item misinterpretation but not to frequency of student cognitive monitoring or lack of confidence. The implications of these conclusions for identifying item misalignment are discussed in light of construct-relevant and construct-irrelevant sources of ambiguity. 相似文献

10.

The Impact of Three Factors on the Recovery of Item Parameters for the Three-Parameter Logistic Model

Kyung Yong Kim Won-Chan Lee 《教育实用测度》2017,30(3):228-242

相似文献

11.

试题命制的理论和技术(一) 总被引：1，自引：0，他引：1

雷新勇周群《考试研究》2008,(1):84-97

大规模教育考试试题命制是以心理学的某些理论假设为基础。与这些理论假设一致的试题定义要求试题应该具备三个要素:测量目标、刺激情境和设问,这三个要素缺失了任何一个,都不能构成完整的试题。根据这些理论假设以及试题定义和要素,本文讨论了命制客观题和主观题的基本要求,客观题包括题干的要求、选项设置的要求以及选项数的问题;主观题包括情境材料的选择、设问、赋分和评分标准制定。相似文献

12.

A Didactic Explanation of Item Bias, Item Impact, and Item Validity From a Multidimensional Perspective

Terry A. Ackerman 《Journal of Educational Measurement》1992,29(1):67-91

Many researchers have suggested that the main cause of item bias is the misspecification of the latent ability space, where items that measure multiple abilities are scored as though they are measuring a single ability. If two different groups of examinees have different underlying multidimensional ability distributions and the test items are capable of discriminating among levels of abilities on these multiple dimensions, then any unidimensional scoring scheme has the potential to produce item bias. It is the purpose of this article to provide the testing practitioner with insight about the difference between item bias and item impact and how they relate to item validity. These concepts will be explained from a multidimensional item response theory (MIRT) perspective. Two detection procedures, the Mantel-Haenszel (as modified by Holland and Thayer, 1988) and Shealy and Stout's Simultaneous Item Bias (SIB; 1991) strategies, will be used to illustrate how practitioners can detect item bias. 相似文献

13.

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Sun‐Joo Cho Youngsuk Suh Woo‐yeol Lee 《Educational Measurement》2016,35(1):48-61

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data. 相似文献

14.

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Jason L. Meyers G. Edward Miller Walter D. Way 《教育实用测度》2013,26(1):38-60

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change, test level, test content, and item format. As a follow-up to the real data analysis, a simulation study was performed to assess the effect of item position change on equating. Results from this study indicate that item position change significantly affects change in RID. In addition, although the test construction procedures used in the investigated state seem to somewhat mitigate the impact of item position change, equating results might be impacted in testing programs where other test construction practices or equating methods are utilized. 相似文献

15.

An NCME Instructional Module: Guidelines for the Development of Item Banks

Annie W. Ward Mildred Murray-Ward 《Educational Measurement》1994,13(1):34-39

Use of item banking technology can provide much relief for the chores associated with preparing assessments; it may also enhance the quality of the items and improue the quality of the assessments. Item banking programs provide for item entry and storage, item retrieval and test creation, and maintenance of the item history. Some programs also provide companion programs for scoring, analysis, and reporting. There are many item banking programs that may be purchased or leased, and there are banks of items auailable for purchase. This module is designed to help those who develop assessments of any kind to understand the process of item banking, to analyze their needs, and to find or develop programs and materials that meet those needs. It should be useful to teachers at all leuels of education and to school-district test directors who are responsible for developing district-wide tests. It may also provide some useful information for those who are responsible for large-scale assessment programs of all types. 相似文献

16.

Multiple-Choice Models: The Distractors Are Also Part of the Item

David Thissen Lynne Steinberg Anne R. Fitzpatrick 《Journal of Educational Measurement》1989,26(2):161-176

This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed. 相似文献

17.

Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

下载免费PDF全文

Woo‐yeol Lee Sun‐Joo Cho 《Journal of Educational Measurement》2017,54(3):364-393

Cross‐level invariance in a multilevel item response model can be investigated by testing whether the within‐level item discriminations are equal to the between‐level item discriminations. Testing the cross‐level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model applications, the cross‐level invariance is assumed without testing of the cross‐level invariance assumption. In this study, the detection methods of differential item discrimination (DID) over levels and the consequences of ignoring DID are illustrated and discussed with the use of multilevel item response models. Simulation results showed that the likelihood ratio test (LRT) performed well in detecting global DID at the test level when some portion of the items exhibited DID. At the item level, the Akaike information criterion (AIC), the sample‐size adjusted Bayesian information criterion (saBIC), LRT, and Wald test showed a satisfactory rejection rate (>.8) when some portion of the items exhibited DID and the items had lower intraclass correlations (or higher DID magnitudes). When DID was ignored, the accuracy of the item discrimination estimates and standard errors was mainly problematic. Implications of the findings and limitations are discussed. 相似文献

18.

基于复合选择题的雷同答卷甄别探析——以某资格考试为例

孔祥常颖昊《中国考试》2021,(2)

基于某资格类考试考后数据,对包含单选题和多选题的复合选择题的雷同答卷进行分析,提出单选甄别、多选甄别和合并甄别3种甄别方案,设计不同测验长度、难度、被抄袭考生水平、题目抄袭比率、作弊考生比率及显著性水平的实验样本。研究结果显示:合并甄别和单选甄别都表现出较好的甄别性能,多选甄别由于多选题数量方面的劣势,甄别率低且Ⅰ型错误率高。据此,提出甄别雷同答卷的建议:如果对甄别出的雷同答卷考生进行违纪违规处理,可将单选甄别和合并甄别分别甄别出的结果取交集作为最后处理依据;如果不对甄别结果进行处理,可将2种方案分别甄别出的结果取并集,以此来对考区或考点的考风考纪进行监管。相似文献

19.

Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

Mark J. Gierl Hollis Lai 《Educational Measurement》2013,32(3):36-50

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content‐specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template‐based method for generating test items. We outline a three‐step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer‐based algorithms to generate new items. Using this template‐based approach, hundreds or even thousands of new items can be generated with a single item model. 相似文献

20.

可转换债券转换价格条款的合理设计

王慧煜夏新平《华中科技大学学报(社会科学版)》2004,18(3):65-69

可转换债券发行是否成功关键在于可转换债券的发行条款的设计是否满足投资的需求。章通过比较分析17家公司的可转换债券发行条款中转换价格、转换价格调整条款，发现我国可转换债券发行条款设计大多雷同。并没有根据实际情况进行创新。章就完善转换价格条款设计提出相关建议。相似文献