首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater‐mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of three rater effects (leniency, central tendency, and severity) in combination with different types of incomplete rating designs (systematic links, anchor performances, and spiral). We used the rating scale model and the partial credit model to calculate rater location estimates, standard errors of rater estimates, model–data fit statistics, and the standard deviation of rating scale category thresholds as indicators of rater effects and we explored the sensitivity of these indicators to rater effects under different conditions. Our results suggest that it is possible to detect rater effects when each of the three types of rating designs is used. However, there are differences in the sensitivity of each indicator related to type of rater effect, type of rating design, and the overall proportion of effect raters. We discuss implications for research and practice related to rater‐mediated assessments.  相似文献   

2.
In one study, parameters were estimated for constructed-response (CR) items in 8 tests from 4 operational testing programs using the l-parameter and 2- parameter partial credit (IPPC and 2PPC) models. Where multiple-choice (MC) items were present, these models were combined with the 1-parameter and 3-parameter logistic (IPL and 3PL) models, respectively. We found that item fit was better when the 2PPC model was used alone or with the 3PL model. Also, the slopes of the CR and MC items were found to differ substantially. In a second study, item parameter estimates produced using the IPL-IPPC and 3PL-2PPC model combinations were evaluated for fit to simulated data generated using true parameters known to fit one model combination or ttle other. The results suggested that the more flexible 3PL-2PPC model combination would produce better item fit than the IPL-1PPC combination.  相似文献   

3.
This article used the multidimensional random coefficients multinomial logit model to examine the construct validity and detect the substantial differential item functioning (DIF) of the Chinese version of motivated strategies for learning questionnaire (MSLQ-CV). A total of 1,354 Hong Kong junior high school students were administered the MSLQ-CV. Partial credit model was suggested to have a better goodness of fit than that of the rating scale model. Five items with substantial gender or grade DIF were removed from the questionnaire, and the correlations between the subscales indicated that factors of cognitive strategy use and self-regulation had a very high correlation which resulted in a possible combination of the two factors. The test reliability analysis showed that the subscale of test anxiety had a lower reliability compared with the other factors. Finally, the item difficulty and step parameters for the modified 39-item questionnaire were displayed. The order of the step difficulty estimates for some items implied that some grouping of categories might be required in the case of overlapping. Based on these findings, the directions for future research were discussed.  相似文献   

4.
The use of surveys, questionnaires, and rating scales to measure important outcomes in higher education is pervasive, but reliability and validity information is often based on problematic Classical Test Theory approaches. Rasch Analysis, based on Item Response Theory, provides a better alternative for examining the psychometric quality of rating scales and informing scale improvements. This paper outlines a six-step process for using Rasch Analysis to review the psychometric properties of a rating scale. The Partial Credit Model and Andrich Rating Scale Model will be described in terms of the pyschometric information (i.e., reliability, validity, and item difficulty) and diagnostic indices generated. Further, this approach will be illustrated through the example of authentic data from a university-wide student evaluation of teaching.  相似文献   

5.
Orlando and Thissen's S‐X 2 item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S‐X 2 to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S‐X 2 in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S‐X 2 is promising for polytomous items in educational and psychological testing programs.  相似文献   

6.
金融危机后,2010年美国通过了《金融监管改革法案》,对信用评级机构的利益冲突和民事责任进行了较大改革,并强化了SEC对评级机构的监管职能。美国三大信用评级机构巨头继续垄断国际评级市场,并催生了2010年以来的欧洲主权债务危机。作为发展中大国,如何发展、壮大我国的评级行业,增加在国际评级市场的话语权,以维护本国的经济主权。美国评级行业发展与改革实践值得借鉴。  相似文献   

7.
TEM4听写采用的是较传统的数错扣分法。数错扣分法是负分法,其中存在一些问题。因此我们提出一种实验性的评分方法——部分得分制。实验数据有两组,分别采用TEM4听写评分制和新评分制。数据比较以及部分得分模型(Rasch模型之一)对实验量表效能的分析(如模型与数据拟合值、被试拟合值、信息函数等)说明,实验评分制能较好地测量大多数学生的听写水平。  相似文献   

8.
Rating scale items have been widely used in educational and psychological tests. These items require people to make subjective judgments, and these subjective judgments usually involve randomness. To account for this randomness, Wang, Wilson, and Shih proposed the random‐effect rating scale model in which the threshold parameters are treated as random effects rather than fixed effects. In the present study, the Wang et al. model was further extended to incorporate slope parameters and embed the new model within the framework of multilevel nonlinear mixed‐effect models. This was done so that (1) no efforts are needed to derive parameter estimation procedures, and (2) existing computer programs can be applied directly. A brief simulation study was conducted to ascertain parameter recovery using the SAS NLMIXED procedure. An empirical example regarding students’ interest in learning science is presented to demonstrate the implications and applications of the new model.  相似文献   

9.
调整了KMV模型中股权市场价值和违约距离的计算,选取中国证券市场30家ST公司和30家非ST公司的数据检验修正后KMV模型的识别能力.结果表明,修正后的KMV模型能够识别上市公司的信用风险,是一种有效的公司债券资信评级方法.  相似文献   

10.
11.
基于改进模糊算法的大学生征信评级模型探讨   总被引:1,自引:0,他引:1  
自1999年我国商业银行开展高校助学贷款业务以来,我国助学贷款的违约率一直居高不下,其发展遭遇到了巨大的信用瓶颈.故本文设计了基于改进模糊算法的高校助学贷款信用评级模型.通过对湖南大学、中南林业、湖南财专、湖南涉外四所高校开展调查问卷,取得了172个贷款学生的样本,然后应用此模型对其进行了信用评级.通过实证分析,得出了低信用评级学生状况与学生违约状况基本吻合的结论,同时探讨了对学生信用评级有重大影响的因子,并在文章的最后为商业银行助学贷款的风险控制提出了相应的政策建议.  相似文献   

12.
The article examines theoretical issues associated with measurement in the human sciences and ensuring data from rating scale instruments are measures. An argument is made that using raw scores from rating scale instruments for subsequent arithmetic operations and applying linear statistics is less preferable than using measures. These theoretical matters are then illustrated by a report on the application of the Rasch Rating Scale Model in an investigation into elementary school classroom learning culture.  相似文献   

13.
Item response theory scalings were conducted for six tests with mixed item formats. These tests differed in their proportions of constructed response (c.r.) and multiple choice (m.c.) items and in overall difficulty. The scalings included those based on scores for the c.r. items that had maintained the number of levels as the item rubrics, either produced from single ratings or multiple ratings that were averaged and rounded to the nearest integer, as well as scalings for a single form of c.r. items obtained by summing multiple ratings. A one-parameter (IPPC) or two-parameter (2PPC) partial credit model was used for the c.r. items and the one-parameter logistic (IPL) or three-parameter logistic (3PL) model for the m.c. items, ltem fit was substantially worse with the combination IPL/IPPC model than the 3PL/2PPC model due to the former's restrictive assumptions that there would be no guessing on the m.c. items and equal item discrimination across items and item types. The presence of varying item discriminations resulted in the IPL/IPPC model producing estimates of item information that could be spuriously inflated for c.r. items that had three or more score levels. Information for some items with summed ratings were usually overestimated by 300% or more for the IPL/IPPC model. These inflated information values resulted in under-estbnated standard errors of ability estimates. The constraints posed by the restricted model suggests limitations on the testing contexts in which the IPL/IPPC model can be accurately applied.  相似文献   

14.
以在Rasch基础上拓展的多维随机系数多项式Logit模型(MRCMLM)为基础,对某高考数学试卷可能存在的三种能力维度模型进行验证性因素分析,最终确定了一种最佳的维度模型,并在该模型框架下进行多维试题分析。  相似文献   

15.
通过梳理美国、欧盟、日本、韩国信用评级监管制度的改革进程,分析国外信用评级监管制度的演变趋势,发现对于信用评级的监管改革主要体现在监管法律体系的完善、监管机制的改革、利益冲突的解决几个方面。  相似文献   

16.
英语口语测试评分的关键在于保证评分信度。文章总结了现行直觉型评分标准受抨击的几个原因,着重分析了国外的两种实践性评分标准:Fu lcher的流利度评分标准和Upshur和Turner的二选一界限定义评分标准(EBBs)。把Fu lcher的流利度评分标准和英语专业四级考试口试(TEM-4SET)的评分标准进行了对比,讨论了其在国内大型口试中付诸实践的可行性。最后提出了实践性评分标准在我国英语专业口语教学和测试中实施的优势。  相似文献   

17.
This paper examines the potential costs and benefits associated with a risk-sharing policy imposed on all higher education institutions. Under such a program, institutions would be required to pay for a portion of the student loans among which their students defaulted. I examine the predicted institutional responses under a variety of possible penalties and institutional characteristics using a straightforward model of institutional behavior based on monopolistic competition. I also examine the impact of a risk-sharing program on overall economic efficiency by estimating the returns to scale for undergraduate enrollment (as well as other outputs) among each of ten educational sectors. My estimates suggest that a risk-sharing program would induce only a modest tuition increase, with considerable heterogeneity across sectors. Two different penalty structures are analyzed in the context of the model, and alternative institutional responses such as tuition discounting and credit rating students are discussed.  相似文献   

18.
This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used  相似文献   

19.
The purpose of this study is to investigate the effects of missing data techniques in longitudinal studies under diverse conditions. A Monte Carlo simulation examined the performance of 3 missing data methods in latent growth modeling: listwise deletion (LD), maximum likelihood estimation using the expectation and maximization algorithm with a nonnormality correction (robust ML), and the pairwise asymptotically distribution-free method (pairwise ADF). The effects of 3 independent variables (sample size, missing data mechanism, and distribution shape) were investigated on convergence rate, parameter and standard error estimation, and model fit. The results favored robust ML over LD and pairwise ADF in almost all respects. The exceptions included convergence rates under the most severe nonnormality in the missing not at random (MNAR) condition and recovery of standard error estimates across sample sizes. The results also indicate that nonnormality, small sample size, MNAR, and multicollinearity might adversely affect convergence rate and the validity of statistical inferences concerning parameter estimates and model fit statistics.  相似文献   

20.
Standard setting methods such as the Angoff method rely on judgments of item characteristics; item response theory empirically estimates item characteristics and displays them in item characteristic curves (ICCs). This study evaluated several indexes of rater fit to ICCs as a method for judging rater accuracy in their estimates of expected item performance for target groups of test-takers. Simulated data were used to compare adequately fitting ratings to poorly fitting ratings at various target competence levels in a simulated two stage standard setting study. The indexes were then applied to a set of real ratings on 66 items evaluated at 4 competence thresholds to demonstrate their relative usefulness for gaining insight into rater “fit.” Based on analysis of both the simulated and real data, it is recommended that fit indexes based on the absolute deviations of ratings from the ICCs be used, and those based on the standard errors of ratings should be avoided. Suggestions are provided for using these indexes in future research and practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号