首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
项目反应理论(Item Response Theory,IRT)又称题目反应理论、潜在特质理论,是在反对和克服经典测验理论(CTT)的不足之中发展起来的一种现代测量理论。  相似文献   

2.
一、批判与初衷现行的测验项目分析方法基本上是以经典测量理论为依据对项目的难度与区分度加以考查。而且,无论对二点式评分项目还是多级评分项目都只分析出一个难度值和一个区分度值退一步说,现行的项目分析方法的局限若是对二点式评分题目还不明显的话,那么在多级评分题目中却是显而易见的。因为多级评分题目中的各个作答步骤(或评分级别)都各有其特征,用一个指标来衡量它们势必掩盖或抹杀了许多有用信息,很大程度上丧失了多级评分题目所应提供的功效。由于IRT(项目反应理论)系统的严谨与手段的灵活,它比经典测量理论更好地表现了被试能力(潜在特质)与其在项目作答中的动态联系。IRT多级反应模型  相似文献   

3.
基于等级反应模型的属性层级方法和多级评分的广义距离法,是两种基于项目反应理论的多级评分的认知诊断方法。本文通过采用Monte Carlo方法模拟比较两种方法的优劣,发现在四种属性层级结构、四种被试作答失误率情况下,对发散型来说,多级评分的广义距离判别法相对更好;对无结构型来说,基于等级反应模型的属性层级方法是更好的选择;对收敛型和线型而言,被试作答失误率比较大时多级评分的广义距离判别法相对更合适,其他情况下两种方法诊断效果差不多。  相似文献   

4.
本研究应用Caojing等人的Bayesian IRT Guessing系列模型,分析初中二年级学生在汉语词汇测验中的猜测行为,使用DIC3指标评价模型的拟合程度,并将参数估计结果与双参数Logistic模型进行了比较。研究发现:(1)猜测模型的拟合度优于双参数Logistic模型;(2)初中二年级测验数据最适合临界猜测模型(IRT-TG),约有3.5%的学生存在TG型猜测行为;(3)猜测者的存在会明显影响本身的能力估计与项目难度估计,但是对非猜测者的能力及区分度参数估计影响不大。  相似文献   

5.
崔维真 《考试研究》2012,(6):88-93,50
本研究根据前人的研究成果,选用单维等级反应模型(GRM),对高等汉语水平考试(简称HSK[高等])口试进行了实验分析。实验假设,等级反应模型下的评分能够更加精细地区分被试的能力。最终实验结果证实了该假设。  相似文献   

6.
计算机化自适应测验是现代测验研究中的一个重要领域,而目前大多研究都基于0-1评分模型,适用范围不广。本文对Samejima等级反应模型进行探讨和研究,运用DELPHI开发研制多级评分计算机化自适应测验。  相似文献   

7.
1984年下半年起,我们开始接触和研究项目反应理论(IRT)。到1986年7月,已经正式成功地编制了“高中数学水平自适应测验”。在实际测验的编制过程中,比较全面地应用了项目反应理论的原则和方法。我们感到,为了更好地实现考试方法和测验理论研究的现代化,有必要深入研究项目反应理论,并努力作出改进和创新。一、崭新的项目反应理论项目反应理论(Item Response Theory),是六十年代才得到迅速发展的一种崭新的测验理论。大家知道,根据经典方法所编制的测验,其难度、区分度和信度等技术质量指标,是严重依赖于样本的。也就是说,这样编出的测验,只适应于跟原先试测过程中所采用的样本非常类似的考生组。否则,测验结果就难以作出正确的解释。  相似文献   

8.
项目反应理论(Item Response Theory,IRT)是在克服经典测验理论局限性的基础上发展起来的,在单维性、局部独立性和单调性的前提假设下,更具有优越性。它以潜在特质理论为基础,采用项目特征曲线假设对其进行建模,产生了三个最基本的模型:正态肩型曲线模型、拉希模型和逻辑斯蒂模型。指出了IRT在实际应用中取得实质性进展的两个方面:计算机自适应测验和认知诊断。  相似文献   

9.
主观题评分质量的估计方法评述   总被引:2,自引:0,他引:2  
在心理测量理论中,主观题的评分质量是一个值得研究的课题。本文分别介绍了三大测量理论(经典测量理论、概化理论、项目反应理论)对于主观题评分质量的估计方法,并对其优劣进行了比较。概化理论和项目反应理论在评价主观题评分质量上具有较明显的优势,如何结合使用三大理论,为主观题评分质量获取更多有价值的信息是值得深入探讨的问题。  相似文献   

10.
采用项目反应理论(IRT)和经典测量理论(CTT)对比研究某科目学业水平考试连续两年的考生成绩.依据项目反应理论具有样本独立性和测验独立性的特点,探索将考生水平和试卷难度参照到同一能力量尺上的测量学方法.三参数逻辑斯蒂克模型(3PLM)的拟合优度好,特别适用于对新高考学考对试卷难度设计、试题测量效果和考生学业质量的评价.考试分数赋以学业质量内涵有利于促进教育评价科学转型,试卷难度从要求根据"合适"考生水平到追求"合理"匹配学业水平,试题评价从依靠专家"经验"为主到依据测量技术分析"实证"为主,学业质量评价从显示"排位"前后论成败到揭示"内涵"结构提内能.应用好项目反应理论为破除"唯分数"痼疾找到一条可行路径,可以在新高考学考评价中发挥重要作用.  相似文献   

11.
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this article outlines how the nonparametric Kaplan-Meier estimator for time-to-event data can be applied to IRT data. Established Kaplan-Meier computational formulas are shown to aid in better approximating “parametric-type” item difficulty compared to methods from existing nonparametric methods, particularly for the less-well-defined scenario wherein the response function is monotonic but invariant item ordering is unreasonable. Limitations and the potential for Kaplan-Meier within differential item functioning are also discussed.  相似文献   

12.
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.  相似文献   

13.
Both structural equation modeling (SEM) and item response theory (IRT) can be used for factor analysis of dichotomous item responses. In this case, the measurement models of both approaches are formally equivalent. They were refined within and across different disciplines, and make complementary contributions to central measurement problems encountered in almost all empirical social science research fields. In this article (a) fundamental formal similiarities between IRT and SEM models are pointed out. It will be demonstrated how both types of models can be used in combination to analyze (b) the dimensional structure and (c) the measurement invariance of survey item responses. All analyses are conducted with Mplus, which allows an integrated application of both approaches in a unified, general latent variable modeling framework. The aim is to promote a diffusion of useful measurement techniques and skills from different disciplines into empirical social research.  相似文献   

14.
Testing the goodness of fit of item response theory (IRT) models is relevant to validating IRT models, and new procedures have been proposed. These alternatives compare observed and expected response frequencies conditional on observed total scores, and use posterior probabilities for responses across θ levels rather than cross-classifying examinees using point estimates of θ and score responses. This research compared these alternatives with regard to their methods, properties (Type 1 error rates and empirical power), available research, and practical issues (computational demands, treatment of missing data, effects of sample size and sparse data, and available computer programs). Different advantages and disadvantages related to these characteristics are discussed. A simulation study provided additional information about empirical power and Type 1 error rates.  相似文献   

15.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

16.
17.
The posterior predictive model checking method is a flexible Bayesian model‐checking tool and has recently been used to assess fit of dichotomous IRT models. This paper extended previous research to polytomous IRT models. A simulation study was conducted to explore the performance of posterior predictive model checking in evaluating different aspects of fit for unidimensional graded response models. A variety of discrepancy measures (test‐level, item‐level, and pair‐wise measures) that reflected different threats to applications of graded IRT models to performance assessments were considered. Results showed that posterior predictive model checking exhibited adequate power in detecting different aspects of misfit for graded IRT models when appropriate discrepancy measures were used. Pair‐wise measures were found more powerful in detecting violations of the unidimensionality and local independence assumptions.  相似文献   

18.
Simulation studies are extremely common in the item response theory (IRT) research literature. This article presents a didactic discussion of “truth” and “error” in IRT‐based simulation studies. We ultimately recommend that future research focus less on the simple recovery of parameters from a convenient generating IRT model, and more on practical comparative estimation studies when the data are intentionally generated to incorporate nuisance dimensionality and other sources of nuanced contamination encountered with real data. A new framework is also presented for conceptualizing and comparing various residuals in IRT studies. The new framework allows even very different calibration and scoring IRT models to be compared on a common, convenient, and highly interpretable number‐correct metric. Some illustrative examples are included.  相似文献   

19.
One of the major assumptions of item response theory (IRT)models is that performance on a set of items is unidimensional, that is, the probability of successful performance by examinees on a set of items can be modeled by a mathematical model that has only one ability parameter. In practice, this strong assumption is likely to be violated. An important pragmatic question to consider is: What are the consequences of these violations? In this research, evidence is provided of violations of unidimensionality on the verbal scale of the GRE Aptitude Test, and the impact of these violations on IRT equating is examined. Previous factor analytic research on the GRE Aptitude Test suggested that two verbal dimensions, discrete verbal (analogies, antonyms, and sentence completions)and reading comprehension, existed. Consequently, the present research involved two separate calibrations (homogeneous) of discrete verbal items and reading comprehension items as well as a single calibration (heterogeneous) of all verbal item types. Thus, each verbal item was calibrated twice and each examinee obtained three ability estimates: reading comprehension, discrete verbal, and all verbal. The comparability of ability estimates based on homogeneous calibrations (reading comprehension or discrete verbal) to each other and to the all-verbal ability estimates was examined. The effects of homogeneity of item calibration pool on estimates of item discrimination were also examined. Then the comparability of IRT equatings based on homogeneous and heterogeneous calibrations was assessed. The effects of calibration homogeneity on ability parameter estimates and discrimination parameter estimates are consistent with the existence of two highly correlated verbal dimensions. IRT equating results indicate that although violations of unidimensionality may have an impact on equating, the effect may not be substantial.  相似文献   

20.
Given the relationships of item response theory (IRT) models to confirmatory factor analysis (CFA) models, IRT model misspecifications might be detectable through model fit indexes commonly used in categorical CFA. The purpose of this study is to investigate the sensitivity of weighted least squares with adjusted means and variance (WLSMV)-based root mean square error of approximation, comparative fit index, and Tucker–Lewis Index model fit indexes to IRT models that are misspecified due to local dependence (LD). It was found that WLSMV-based fit indexes have some functional relationships to parameter estimate bias in 2-parameter logistic models caused by violations of LD. Continued exploration into these functional relationships and development of LD-detection methods based on such relationships could hold much promise for providing IRT practitioners with global information on violations of local independence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号