共查询到19条相似文献,搜索用时 484 毫秒
1.
本文使用R-2.15.2软件模拟研究锚测验难度参数方差特征对测验等值误差的影响,采用三种等值方法(链百分位等值法、Levine等值法和Tucker等值法)对锚测验不同类型的难度方差进行比较研究。结果显示,当锚测验难度方差小于全测验难度方差时,其等值的随机误差和系统误差与锚测验难度方差和全测验难度方差一致时(即锚测验为全测验的平行缩减版minitest时)的表现基本相同。因此,对锚测验而言,要求其与全测验具有相同的统计规格可能过于严格。 相似文献
2.
3.
通过对测验等值中线性等值公式进行研究,得到改进后的线性等值公式,该公式不仅与两测验的相关系数有关,而且同两测验的信度有密切联系。目前常用的线性等值公式是其当两份测验信度相等时的一个特例。 相似文献
4.
5.
6.
7.
莆田高专测验等值研究课题组 《莆田学院学报》1999,(1)
测验等值的单组设计,是将需等值的测验X、Y都由同一考生组施测,然后对测验分数进行等值。其优点是考生组只有一个X、Y成绩的差异将归因于测验的不同而不会混杂考生组不同的因素。缺点是同一考生要测验两次,练习效应和疲劳将会干扰等值结果。本文提出一种新的设计方法──单组设计试卷分半法,是在单组设计中,把测验X、Y各分成平行的两半卷,各取X、Y的半卷组成新的测验Z,将Z对同一考生组施测,根据施测结果导出等值转换公式进行分数等值转换。这种方法每个考生只施测一次,既保持了单组设计的优点,又克服了它的缺点。 相似文献
8.
9.
等值误差理论与我国高考等值的误差控制 总被引:2,自引:0,他引:2
戴海崎 《江西师范大学学报(哲学社会科学版)》1999,(1)
测验等值误差有随机误差和系统误差两种。随机误差的产生来自于抽样,其大小主要受样本容量影响,有两种估计等值随机误差的方法。系统误差产生的原因比较复杂,有些系统误差可采用一定的办法予以估计,有些系统误差是无法估计的。我国高考等值的前期工作已经在方案设计、数据采集、锚题编制、等值关系计算等方面努力贯彻了误差控制思想,取得了较好效果。建议今后应采用预估样本容量,有计划更换锚题、精心设计等值路径、选择适当的平滑曲线次数等技术措施更有效地控制高考等值误差 相似文献
10.
测验等值使得不同形式的考试能进行比较,从而保证了测验之间的相对稳定性。基于IRT的分数等值是在估计出参数的基础上进行的参数转换,等值结果的稳定性与考生样本量密不可分。本研究针对汉语水平考试(HSK)阅读分测验,采用真实数据模拟共同组锚测验设计,确定等值的参照标准,考察考生样本量的变化对IRT分数等值稳定性的影响。结果表明,考生样本量为2000左右时各种方案的等值结果均比较稳定。考生样本量进一步增大时,等值误差不降反增。 相似文献
11.
NCME 2008 Presidential Address: The Impact of Anchor Test Configuration on Student Proficiency Rates
Anne R. Fitzpatrick 《Educational Measurement》2008,27(4):34-40
Examined in this study were the effects of reducing anchor test length on student proficiency rates for 12 multiple‐choice tests administered in an annual, large‐scale, high‐stakes assessment. The anchor tests contained 15 items, 10 items, or five items. Five content representative samples of items were drawn at each anchor test length from a small universe of items in order to investigate the stability of equating results over anchor test samples. The operational tests were calibrated using the one‐parameter model and equated using the mean b‐value method. The findings indicated that student proficiency rates could display important variability over anchor test samples when 15 anchor items were used. Notable increases in this variability were found for some tests when shorter anchor tests were used. For these tests, some of the anchor items had parameters that changed somewhat in relative difficulty from one year to the next. It is recommended that anchor sets with more than 15 items be used to mitigate the instability in equating results due to anchor item sampling. Also, the optimal allocation method of stratified sampling should be evaluated as one means of improving the stability and precision of equating results. 相似文献
12.
Sandip Sinharay 《Educational Measurement》2018,37(2):64-69
The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested “miditests,” which are anchor tests that are content‐representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties. Sinharay and Holland (2006, 2007), Cho, Wall, Lee, and Harris (2010), Fitzpatrick and Skorupski (2016), Liu, Sinharay, Holland, Curley, and Feigenbaum (2011a), Liu, Sinharay, Holland, Feigenbaum, and Curley (2011b), and Yi (2009) found the miditests to lead to better equating than minitests, which are representative of the total test with respect to content and difficulty. However, these findings recently came into question as Trierweiler, Lewis, and Smith (2016) concluded, based on a comparison of correlation coefficients of miditests and minitests with the total test, that making an anchor test a miditest does not generally increase the anchor to total score correlation and recommended the continuation of the practice of using minitests over miditests. Their recommendation raises the question, “Should miditests continue to be considered in practice?” This note defends the miditests by citing literature that favors miditests and then by showing that miditests perform as well as the minitests in most realistic situations considered in Trierweiler et al. (2016), which implies that miditests should continue to be seriously considered by equating practitioners. 相似文献
13.
The study examined two approaches for equating subscores. They are (1) equating subscores using internal common items as the anchor to conduct the equating, and (2) equating subscores using equated and scaled total scores as the anchor to conduct the equating. Since equated total scores are comparable across the new and old forms, they can be used as an anchor to equate the subscores. Both chained linear and chained equipercentile methods were used. Data from two tests were used to conduct the study and results showed that when more internal common items were available (i.e., 10–12 items), then using common items to equate the subscores is preferable. However, when the number of common items is very small (i.e., five to six items), then using total scaled scores to equate the subscores is preferable. For both tests, not equating (i.e., using raw subscores) is not reasonable as it resulted in a considerable amount of bias. 相似文献
14.
目的:探讨心理行为训练对大学生意志品质的影响。方法:采用自编大学生意志品质量表,对参加心理行为训练的38名大学生进行测查。结果:①干预组在前测、后测中,其果断性因子、自觉性因子、自制力因子和总均分都无显著差异(P0.05),而在坚韧性因子存在显著差异(P0.05);②对照组在前测、后测中,各项因子得分和总均分均无显著差异(P0.05);③在及时后测中,干预组与对照组相比,意志品质各因子均存在显著差异(P0.05),总均分差异极其显著(P0.01);④在长效后测中,干预组和对照组在自觉性因子和总均分上存在显著差异(P0.05),在果断性、坚韧性和自制力上无显著差异(P0.05)。结论:心理行为训练能有效提高大学生意志品质水平,可广泛应用于高校大学生意志品质教育和心理健康教育。 相似文献
15.
Daniel R. Eignor 《Educational Measurement》2008,27(4):30-33
This article discusses a particular type of concordance table and the potential for test score misuse that may result from employing such a table. The concordance that is discussed is typically created between scores on different, nonequatable versions of a test that share the same or close to the same test title. These concordance tables often appear in the context of relating scores on computerized adaptive and paper‐and‐pencil versions of the same test. When such a table is presented in a complete point‐by‐point fashion, relating each reported score on the scale of the new version of the test to a reported score on the scale of the old version of the test, test score users will typically treat the table as if it represented an equating of scores between the two versions, and directly replace scores on the new version of the test by scores on the old version. This clearly represents a misuse of the test scores. Suggestions for avoiding this misuse of test scores from concordance tables are provided. 相似文献
16.
Bjrn Andersson 《Journal of Educational Measurement》2016,53(4):459-477
In observed‐score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response theory (IRT) model. The parameters from such a model can be utilized to derive the score probabilities for the tests and these score probabilities may then be used in observed‐score equating. In this study, the asymptotic standard errors of observed‐score equating using score probability vectors from polytomous IRT models are derived using the delta method. The results are applied to the equivalent groups design and the nonequivalent groups design with either chain equating or poststratification equating within the framework of kernel equating. The derivations are presented in a general form and specific formulas for the graded response model and the generalized partial credit model are provided. The asymptotic standard errors are accurate under several simulation conditions relating to sample size, distributional misspecification and, for the nonequivalent groups design, anchor test length. 相似文献
17.
Haiwen Chen 《Journal of Educational Measurement》2012,49(3):269-284
In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating. 相似文献
18.
Marie Wiberg Wim J. van der Linden Alina A. von Davier 《Journal of Educational Measurement》2014,51(1):57-74
Three local observed‐score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias—as defined by Lord's criterion of equity—and percent relative error. The local kernel item response theory observed‐score equating method, which can be used for any of the common equating designs, had a small amount of bias, a low percent relative error, and a relatively low kernel standard error of equating, even when the accuracy of the test was reduced. The local kernel equating methods for the nonequivalent groups with anchor test generally had low bias and were quite stable against changes in the accuracy or length of the anchor test. Although all proposed methods showed small percent relative errors, the local kernel equating methods for the nonequivalent groups with anchor test design had somewhat larger standard error of equating than their kernel method counterparts. 相似文献
19.
Jinghua Liu Sandip Sinharay Paul W. Holland Edward Curley Miriam Feigenbaum 《Journal of Educational Measurement》2011,48(4):361-379
This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional “mini” anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a “midi” anchor (Sinharay & Holland), has a smaller spread of item difficulties than the tests to be equated. Both anchors were administered in an operational SAT administration and the impact of anchor type on equating was evaluated with respect to systematic error or equating bias. Contradicting the popular belief that the mini anchor is best, the results showed that the mini anchor does not always produce more accurate equating functions than the midi anchor; the midi anchor was found to perform as well as or even better than the mini anchor. Because testing programs usually have more middle difficulty items and few very hard or very easy items, midi external anchors are operationally easier to build. Therefore, the results of our study provide evidence in favor of the midi anchor, the use of which will lead to cost saving with no reduction in equating quality. 相似文献